<<

Front.Comput.Sci. DOI

REVIEW ARTICLE

Information Retrieval: A View from the Chinese IR Community

Zhumin CHEN1, Xueqi CHENG2, Shoubin DONG3, Zhicheng DOU4, Binxing FANG5, Jiafeng GUO 2, Xuanjing HUANG6, Yanyan LAN 2, Chenliang LI7, Ru LI8, Tieyan LIU9, Yiqun LIU 10, Jun MA1, Bing QIN11, Mingwen WANG12, Jirong WEN4, Jun XU2, Bo ZHANG10, Min ZHANG10, Peng ZHANG13, Qi ZHANG6, and Ming ZHOU9

1 Shandong University, Jinan 250100, China 2 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China 3 South China University of Technology, Guangzhou 510006, China 4 Renmin University of China, Beijing 100872, China 5 Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China 6 Fudan University, Shanghai 200433, China 7 Wuhan University, Wuhan 430072, China 8 Shanxi University, Taiyuan 200433, China 9 Microsoft Research Asia, Beijing 100080, China 10 Tsinghua University, Beijing 100084, China 11 Harbin Institute of Technology, Harbin 150001, China 12 Jiangxi Normal University, Nanchang 330022, China 13 Tianjin University, Tianjin 300072, China

c Higher Education Press and Springer-Verlag Berlin Heidelberg 2012

Abstract During a two-day strategic workshop in Febru- ary 2018, 22 Information Retrieval researchers met to dis- cuss the future challenges and opportunities within the field. 1 Introduction The outcome is a list of potential research directions, project ideas, and challenges. This report describes the major con- The early idea of using computers to search for relevan- clusions we have obtained during the workshop. A key result t pieces of information was popularized in the article ‘As We is that we need to open our mind to embrace a broader IR May Think’ by Vannevar BushÃC´ in 1945 [5]. During the field by rethink the definition of information, retrieval, user, six decades that followed, we have witnessed the booming system, and evaluation of IR. By providing detailed discus- of information retrieval (IR) in both industry and academia. sions on these topics, this report is expected to inspire our IR Especially after Web search engines were invented, the aris- researchers in both academia and industry, and help the future ing need for advanced IR technologies led to a huge wave of growth of the IR research community. IR researches in our community. This was reflected by the growing numbers of IR related conferences, workshops, and Keywords Up to 8 words separated by commas. contests, as well as the huge volume of ideas generated in those events. However, during the past few years, many members in Received month dd, yyyy; accepted month dd, yyyy our community have raised their concerns that IR research- E-mail: guojiafeng,[email protected] [email protected] es seem to be shrinking [9]. For example, the numbers of Zhumin CHEN et al: Information Retrieval: A View from the Chinese IR Community 2 submissions to major IR conferences (e.g., SIGIR) are keep- As we have noticed, in the BigData+AI era, many research ing stable if not declining, while those to the sibling con- fields have been greatly boosted, e.g., machine learning, nat- ferences (e.g., KDD and ACL) have significantly increased. ural language processing, computer vision, etc. However, our Many IR related research topics, such as recommender sys- IR community is a little bit losing its track comparatively. It tems, multimedia retrieval, and human computer interaction- would be timely and helpful for us to perform a serious re- s, have faded out and found their new home (e.g., RecSys, flection on the current situation of IR research whether the ICMR, CHI). Some closely related research areas, like natu- current components of IR, e.g., system architecture, retrieval ral language processing and data mining, have been driven by models, user modeling, evaluation methodologies need to be the new engine of Big Data and Artificial Intelligence, while refined or even re-defined. With this motivation, we have or- the IR community seems to remain in its traditional pace in ganized a workshop to collect the wisdom of crowds. We contrast. noted down some of our discussions during the workshop in Given all of this, it seems necessary for us to have a reflec- the following sections, with the goal of pushing forward the tion on our current situation, and figure out the major chal- Renaissance of IR research. lenges and opportunities that the community is facing. With such a motivation, the Chinese IR community organized a s- 2.2 Dimensions of Redefinition trategy workshop to discuss future challenges and opportuni- During the discussions of this workshop, we consider the ties within the broad IR field on February 2nd and 3rd, 2018. redefinition of seven dimensions of Information Retrieval. The goal is to open our mind by rethinking the definition of Here, we summarize some basic ideas, whereas details of information, retrieval, user, system, and evaluation. The ex- some of them will be discussed in the next sections. pected output is a list of exciting and challenging future re- search directions that we should devote our time and energy • Redefine Information to. In this paper, we report the major conclusions we have The terminology information does not only refer to doc- obtained during this strategy workshop. uments or webpages, but also involves a variety of infor- mation. The richness of information could be character- ized by its openness of information (e.g., private or pub- lic), its formats (e.g., webpages, microblogs, WeChat di- 2 Redefining Information Retrieval alogue or APPs), and its structure (e.g., free texts, tables 2.1 Motivations or knowledge graphs). • Redefine Scope of Retrieval Over decades, IR researches and applications have achieved The scope of retrieval should not be restricted to search- great success. Especially after the computer was invented, ing indexed documents/web pages. It should also cover many solid IR technologies have emerged. For instance, the the searching of user generated information (e.g., gener- inverted index [51], the vector space models [38], the prob- ated microblog posts/CQA contents), the combination abilistic retrieval models [36], the language models [29, 49], of different information resources, and the reasoning and the Cranfield evaluation methodology, etc. Driven by the from knowledges. Even further, we should not restric- invention of the World Wide Web in late 1990s, Web search t ourselves to retrieval, and should think about recom- became one of the main research areas in IR. Corresponding- mendation, text analytics, question answering, text sum- ly, link analysis (e.g., PageRank [34] and HITS [25]), query marization, chatbot, as well. logs based ranking signals, and the learning-to-rank tech- • Redefine Retrieval models niques have been developed, which enabled us to leverage Traditional retrieval models, which have been used for the interconnectivity of billions of web pages, the behaviors decades, should also be refined. For example, the target of millions of users, and the combination of thousands of sig- should be much border than a listwise ranking result. nals to make IR systems stronger than ever. Besides, more advanced technologies should be lever- However, at the same time, the gap between the accessi- aged to enhance the ability of retrieval models, e.g. deep ble Web data in industry and academia is getting wider. This learning, reinforcement learning, and adversarial learn- limits the healthy evolution of the IR research community, ing. The general principle is to synchronize the design especially in the Big Data and Artificial Intelligence (BigDa- of retrieval models with the technical frontier of other ta+AI) era. fields, especially machine learning and artificial intelli- Front. Comput. Sci. 3

gence. Data indexed by current information retrieval systems mainly • Redefine Users come from news websites, video websites, image websites Previously users are regarded as customer of IR systems, etc. However, with the development of Internet of things however, today they are also contributors. The interac- and mobile Internet, the data sources should also include tions between users and IR systems should be empha- wearable devices, mobile devices, smart home sensors, sized and investigated with a new paradigm. Game the- APP stores, etc. Based on these different kinds of data ory can be utilized to model such user-system interplay, resources, an effective sharing mechanisms for data should and corresponding learning algorithms, e.g., generative be established for efficient retrieval. adversarial networks, reinforcement learning, and game- theoretic learning should be leveraged. Representation of indexed data to be retrieved and under- • Redefine System Architecture stood effectively The architecture of IR systems should be redefined or re- designed in order to index different types of data (struc- To index and retrieve the big and complex data accurate- tured or unstructured), integrate the cloud computing ly, effective data representation is a necessary step. The technologies, have clearer interfaces, and build end-to- traditional data representation in current retrieval systems end systems. is one-hot representation, i.e. bag-of-words. For a given • Redefine Evaluation Web page with different terms, it is represented as a vector We should emphasize the online and interactive nature where each dimension is 0/1 or a weight computed by T- of IR evaluation. A well-defined simulation system FÃCÂ˚uIDFcorresponding´ to a term. One-hot representation might be necessary to bridge the gap between offline and assumes that all dimensions are independent. As a result, online evaluations. the similarity relations between words are not inaccurate. • Others With the development of representation learning, distributed Except the aforementioned topics, we also need to con- representation denotes an object as a dense, real-valued and sider other important factors, such as new IR theory, in- low-dimensional vector. The object maybe a letter, a word, terpretation of retrieval models, and so on. a sentence, a document, an APP, a user, an item, a query etc. A document can be represented by fusing different levels of Based on the above outline, we have conducted extensive granularity (character, word, sentence, passage, document). discussions on each aspect of redefinition as well as their re- After objects and queries are denoted into a unified space, it lated topics (except the IR theory, which we believe requires is easy to compute the similarity between them accurately. more in-depth study in a dedicated paper). We summarize these detailed discussions as potential exciting and challeng- ing IR research topics in the next a few years. Representation of extracted knowledge to answer users queries directly

After knowledge is extracted, it should be represented within 3 New Definition and Representation of Infor- a common semantic space in order to be used for information mation retrieval systems. At present, knowledge graph is the most 3.1 Motivation popular form of knowledge which is represented as a series of triples, i.e. head entity, tail entity, relation between head Traditional informational retrieval systems are mainly based and tail. Entities and relations are also denoted as the dense, on text retrieval. However, with the rapid development of In- real-valued and low-dimensional vectors with distributed rep- ternet, more and more data are generated by different ways. resentation learning. These vectors form a unified semantic The data are large-scale, multi-source, heterogeneous, cross- space in which knowledge can be transferred across sources, domain, cross-media, cross-language, and dynamically e- domains and models. Finally, the next generation retrieval volving [8]. All these characteristics of big data bring new re- systems can understand users queries better, retrieve relevant search topics and massive challenges to information retrieval. objects more accurately with the help of knowledge. Further- more, for some difficult queries that cannot be matched from 3.2 Proposed Research Topics the Web directly, knowledge graph can be used to infer the Fusion of big data from heterogeneous sources correct answers. Zhumin CHEN et al: Information Retrieval: A View from the Chinese IR Community 4

3.3 Research Challenges (1) Users must know in advance what information they need, and then try to pull the information from the static doc- Some challenges will be faced by the next generation retrieval ument set. systems with the development of Internet and new technolo- (2) Information retrieval systems are generally uni- gies. directionally query-based. They are only able to respond to (1) Cross source/domain/mode/language data indexing. specific user requests. They can generally neither proactively Data to be indexed will come from many new sources such generate information for users, nor even respond to queries in as wearable devices, tensors. How to design a new approach a user-specific fashion. to aggregate and index data from multiple sources, domains, (3) The documents indexed in IR systems are relatively modes, languages and views in order to retrieve the diversity fixed and the provided information is directly selected from results effectively and accurately is a challenge. the index. Any indirect knowledge available through analy- (2) Cross source/domain/mode/language data representa- sis of current information, or implicit knowledge inherent in tion. How to learn distributed representations for different the patterns of information retrieval, cannot be exploited to information objects or users information needs from multiple enable push of user-specific content or to enhance semantic sources, domains, modes, languages and views considering representations of content. their contents, relations, structures and faces in order to com- (4) Listing a list of documents ranked with relevance is far pute the similarity accurately between them is another chal- from the optimal way for representing the retrieval results. lenge. To satisfy users’ information requirement, an answer directly (3) Knowledge graph representation and indexing. Knowl- summarized from the relevant documents is a better output, edge graph is especially useful for open domain information which is beyond the current ranking based optimization tar- retrieval tasks. How to best construct, represent, index and get. utilize knowledge graphs so that they are maximally useful Overall, for better satisfying users information needs and for the next generation retrieval system is also a challenge. improving their search experiences, IR systems are expected (4) Data sharing mechanism. How to build an open mech- to go beyond the simple relevant document ranking. There anism to share deep processed and high value data (private or are more and more demands on moving from existing infor- public) among different research institutions and enterprises mation retrieval paradigm to general information access. under the protection of privacy in order to satisfy different users needs is a challenge, too. 4.2 Proposed Research Topics

From active information retrieval to passive information re- 4 Enlarged Scope of Retrieval trieval At present, IR systems assume that the users know exactly 4.1 Motivation in advance what information they need. They also have the Conventional information retrieval [39] has, to some exten- ability of summarizing their information needs as keyword t, the underlying assumption that users’ information needs, queries. This is usually called active information retrieval as represented with the issued keyword queries, can be satisfied the users actively issue queries and wait the systems to an- with a list of ranked documents. The documents are retrieved swer the queries with document lists. In active information from a static document collection and ranked according to retrieval, the burden on search users is high as the users need the relevance of the document to the query. Based on the to represent their search intent with keyword queries accu- assumption, different relevant ranking models, also called re- rately and instantly. In many cases, this is hard or even im- trieval models, have been proposed and successfully applied possible for search users. in search engines. In passive information retrieval, on the other hand, the Recently, however, this assumption has been challenged keyword queries are no longer necessary for pushing the by the development of Web and the emergence of variant ap- information. The user is kept to be updated with new proaches to accessing information. Researchers realize that information after some initial configurations. For example, information retrieval is not equal to document ranking or re- the users may want to be notified if there are some new trieving a list of documents. The limitations of conventional publications in a particular research topic or there are some IR models and systems include: new citations to some particular papers. In some extreme Front. Comput. Sci. 5 cases, the retrieved resources are not merely stored statically relevant documents that already exist on the web, these ser- but are reported more or less promptly to those who are vices may provide direct answers or higher-order knowledge interested or are assumed to be interested. For example, the to users, or move from the traditional ten blue links to con- information resources can be periodic reports generated by versational IR [4]. some information systems. 4.3 Research Challenges Going beyond retrieving a closed set of documents There are several challenges when we go beyond the existing Current IR systems are designed to retrieve a static set of IR paradigm. documents. Thus, once been deployed, the indexed docu- (1) Currently, search engine is a successful IR application ments are remains relatively fixed and can be updated peri- and it significantly impacts the IR research community. New odically with newly retrieved documents. Users can get the killer applications are needed to demonstrate the usefulness most relevant documents, but of course, they are fetched by and effectiveness of these new IR approaches. the IR systems from a closed document set. (2) How to effectively evaluate the new IR approaches is a Ideally, an IR system can not only retrieve information challenge. It is usually hard to improve the quality of algo- from the closed document set, it can also provide users rithms without a clear evaluation criterion. The complexity the generated and modified new information, for better of the new IR functions we proposed above is much higher answering the user queries. than retrieving a simple document list, and we need carefully design corresponding evaluation metrics for these tasks. Going from search engines to analytic engines Existing search engines are designed for finding the doc- uments that may satisfy users information need, by issuing simple keywords as queries. Users need to browse and 5 AI-Enhanced Retrieval Models read the results and summarize the information contained 5.1 Motivation in these documents by themselves. This usually costs much user efforts when users have complex information needs, IR models lie in the heart of the information retrieval research such as doing a survey on a research topic or learning field. Different techniques have been proposed and applied in about the latest progress on an event. It would be great if IR models, from traditional heuristic methods, probabilistic search engines can directly extract and aggregate relevant models, to new machine learning to rank techniques. Re- information nuggets (such as topics, events, person names, cently, with the advance of new AI technology, such as deep locations, organizations, etc) from search results, and provide learning and reinforcement learning, a lot of research areas a kind of multi-dimensional interactive analytics to users. have been pushed forward, including speech recognition [20], Users can click on a dimension item, and drill down into computer vision [26], and natural language processing [43]. the information they are interested in. This is a kind of This has led to expectations that these novel AI techniques text analytical service like OLAP [7] in the database field. are likely to demonstrate similar scale of breakthroughs on This could significantly reduce user efforts on reading and IR tasks. summarizing documents by themselves, and narrow the gap During the past few years, we have witness the growing between real user information need and the information body of work in applying deep neural networks in IR model- returned by IR systems. s. There have been related workshops, such as SIGIR Neu-IR workshops [10,11], encouraging the discussions and develop- New functions for information retrieval ment of new IR models with neural networks. However, we With the development of internet services, there are much are still in the early days in leveraging AI techniques for IR more diverse information needs that can be categorized into models. Unlike in computer vision or natural language pro- the area of information access, which can be an importan- cessing, few positive results have been reported in IR with t research problems in the broader IR area. In addition to a new new AI techniques. We are still waiting for exciting new traditional IR system, users may need other kinds of services, breakthroughs in this field. Meanwhile, there are broad AI like recommendation, text analytics, question answering, text methods, beyond deep neural networks, to be explored to en- summarization, chatbot, etc. Different from the existing IR hance IR models. These new techniques, like generative ad- systems (such as search engines) which focus on retrieving versarial networks [15] and deep reinforcement learning [32], Zhumin CHEN et al: Information Retrieval: A View from the Chinese IR Community 6 could lead to new IR models as well as the definition of new • How can deep architectures give us new insights about core IR tasks. IR problems? To incorporate novel AI techniques to enhance IR models, • What are the appropriate training data, test data and toolk- we need to clear up, to what IR applications could AI tech- its for neural models for IR? niques be applied, to what extent could AI techniques bring • How can we interpret the learned deep neural models for in retrieval performance, what will be the challenges in this IR? enhancement, what are the basic influences in IR models, and • What are the relationships between neural models and the deficiency as well as the most overlooked places. traditional models for IR?

5.2 Proposed Research IR models using reinforcement learning The proposed research can be divided into the following four Reinforcement learning is to learn how to interact with major areas: environment by maximizing a future reward. The recen- t progress in combining with reinforcement IR models with neural representations learning has achieved a lot of momentum especially in com- By encoding texts or images into real-valued vector repre- puter games [32, 41]. Since IR in essential is about the in- sentations, neural representations have shown their ability in teraction between users and information, or users and search capturing the semantic meanings of the objects, and demon- systems, it is natural to utilize reinforcement learning meth- strated incredibly power in natural language processing and ods for IR tasks. We list a few potential research directions image recognition. These neural representation techniques where reinforcement learning could be applied to enhance IR could also bring significant changes to IR by altering the fun- models: damental representations of queries, documents and so on. • Online learning to rank in order to optimize an IR model Example research questions include: dynamically over time • How to leverage neural representations to enhance the • Session search where users aim to complete a complex IR modeling of different IR objects, such as queries, doc- task with multiple steps in a session uments, images, questions and answers? What are the • Conversational search where multi-turn interactions be- major benefits? tween users and systems are involved to obtain the target • How similarity/relevance estimation can be enhanced via information neural representations? • Some complex ranking tasks, e.g. diversified search, • How to improve the efficiency of learning good represen- where independence between documents no longer tations for queries and documents? holds and a list needs to be optimized according to mul- • How can neural representations be efficiently indexed for tiple criteria online access? • User modeling where a dynamic user profile needs to • How to leverage neural representations to enhance cross- be obtained in order to achieve personalized search or modal search? recommendation

IR models using adversarial methods IR models using deep neural networks Adversarial methods is among the important progresses Deep neural networks have shown promising results in in recent AI researches, which is useful to produce robust modeling complex tasks with their ability of approximating models by introducing adversarial samples or signals. The arbitrary functions. It is natural to apply these powerful mod- same idea has also been applied in generative models, i.e. els for IR, but it is limited by simply using them to pushing GAN [15], where the generator tries to generate adversari- up the retrieval performances. There are still many impor- al instances that can cheat the discriminator, while the dis- tant questions we need to tackle when applying deep neural criminator tries to be robust to the adversarial instances. It models for IR, such as: would be promising to borrow these ideas into IR and we • What are the right architectures of neural networks for dif- have already witnessed some success in this direction, like ferent IR tasks? IRGAN [48], but there is still much space to be explored: Front. Comput. Sci. 7

• How to construct a robust ranking model by involving ad- progress on metrics and models may serve as a new versarial instances? breakthroughs on this direction. • What are the adversarial samples in different IR tasks and how to generate them automatically? • Is it possible to leverage GAN to automatically generate 6 Expanded Role of Users search results, user queries, answers and so on? • Is it possible to use adversarial idea to model the learning 6.1 Motivation and evaluation processes? User is an important part of an IR system. Although histor- ical user behaviors have been widely used in IR systems to improve the ranking quality [2, 16], users are mainly treat- 5.3 Research Challenges ed as consumers who use IR systems to find the information they need. Users are usually separated from the IR system, Reproducibility: Many AI enhanced IR models (e.g., neu- and are not considered as a part of the information process ral IR models) often consist of a large number of free workflow. From another perspective, users are assumed to be parameters to be learned, thus require large quantities of consuming data, but not “generating” data. labeled training examples. Due to the lack of large scale In recent years, with the rapid growth of social network- public datasets, many recently published models have s [33], users are no longer simple consumers, but can produce been trained and evaluated on private industrial datasets. data, proactively or passively. These data can be consumed Their results cannot be fully recovered as their dataset- by other users or be used for boosting quality of the retrieval s are often not available to academia. Moreover, since service. Users play a more important role in an IR system. these models often consist of many hyper-parameters as We need to carefully rethink the definition of users when de- well as some detailed tuning tricks which are often miss- signing a new IR system, taking users as an important part of ing in published papers, these problems make the issue the system. of reproducibility more serious. Generalizability: The generalization ability of modern 6.2 Proposed Research Topics complex AI models is still unclear. Most of them involve large number of hyper-parameters, making them vulner- Including Users in the IR Circle able in transferring from one dataset to another. This is The fundamental problem is to redefine the role of users in largely different from many traditional IR models, which IR systems. More system functions need to be designed and are usually simple but can achieve robust performance studied for proactively encouraging users to generate data, out of the box over different datasets. Therefore, we without additional overload. Efforts need to be spent on go- require better understanding on the key design princi- ing from existing document-centric information architecture, ples of AI enhanced IR models to guide us to choose the to user-centric one. More information retrieval activities proper architectures for different IR tasks. should be redefined to include users in the information Interpretability: AI enhanced IR models, e.g., some deep producing and consuming circle. neural IR models, may behave like a black box and hard to interpret. This is mainly due to many nested non- User Profiling and Personalization linearity functions and the end-to-end training fashion. In existing IR systems, personalization is not fully However, when IR models are applied to accomplish demonstrated to be effective [12]. Sometimes, improper many practical tasks, the interpretability of the results personalization may harm the relevance of results and affect as well as the model itself becomes critical. user experience. In most search engines, personalization Evaluation: With AI enhanced IR models, many traditional is mainly about customizing search results based on users’ benchmark datasets as well as evaluation metrics may no location and languages. The full advantage of person- longer fit well. Meanwhile, many complex tasks, such alization is not exploited due to the complexity of user as conversational search and proactive search, could be interest and personalization algorithms. How to control taken into consideration. Therefore, the developmen- the quality of personalization and implement an effective t of stable and discriminative metrics for these novel s- search result personalization system is still a grand challenge. cenarios and new models become urgent. A combined Zhumin CHEN et al: Information Retrieval: A View from the Chinese IR Community 8

Personalization and Diversification In recommendation systems, it becomes a concern to Correct and Accurate Interpretation of User Behavior generates over-personalized results. Users usually need di- Although historical user behavior is a valuable informa- versified results that are relevant to their information need. It tion source for improving the quality of retrieval service, it is is an interesting problem that how to balance personalization also known as noisy and biased [37]. For example, previous and diversification, to improve overall user satisfaction. studies have shown that user clicks are heavily influenced by the position bias, and proposed a series of click models User-centric Evaluation for IR systems to calibrate the biased CTR signals and extract unbiased The goal of IR system is to fulfill users’ information relevance feedback. We need further careful investigation on need and make the user satisfied with the overall search the intrinsic bias in user behavior if we want to exploit it to experience. Traditional system-centric evaluation paradigms improve the search system. (e.g. the Cranfield-style evaluation) are based on a set of (over-)simplified assumptions about users and therefore cannot perfectly measure the actual user satisfaction and preference. By re-considering and emphasizing user’s role in 7 New System Architecture the search process, we can develop new evaluation paradigms and methods that are more aligned to users experience for IR 7.1 Motivation systems. In recent years, the rapid development of mobile informa- tion services and large-scale expansion of personalized and specialized information resources provide unprecedented po- 6.3 Research Challenges tential and new opportunities for information growth. With the development of the next-generation Internet, in- Privacy Protection formation resources tend to be largely distributed. Every Web There is always a trade-off between utilizing user data to user would publish messages and share information in social improve service quality, and protecting user privacy. The circles, which makes Web information substantially hetero- user privacy protection problem raises more and more public geneous, highly aggregated, and socially shared. The social attention in recent years. aggregation feature and distributed divergence trend present grand challenges to the existing information retrieval system Supporting Complex Search Tasks architecture. When completing a complex search task, the user usually Though the HTTP-based Web encourages a high degree of needs to submit multiple queries to search engines to tackle centralization, emerging technologies and protocols such as each subtask separately or to gradually learn about a topic Block Chain [44] and IPFS (Inter Planetary File System) [3] in the multi-query session. Such a complex search task is are trying to establish a trust-driven decentralized Web ser- still considered as challenging for the user and the success vice. In addition, information resources will probably be of this task may heavily depend on users’ iterative querying content-based addressed, instead of domain-based addressed. behavior. Therefore, it is important for the IR system to Information sharing is limited to certain range and requires build a task-level or session-level user model, and to provide credit as well as permission. It is foreseeable that new tech- necessary supports (e.g. query suggestions and task-aware nologies will be of profound impact to information retrieval. ranking) in this scenario. According to the re-definition of IR, users will be core part of the IR framework, not simply participants. To adapt to the Understanding Users in Heterogeneous Environments rapid accumulation of information resources and the develop- In recent years, users use IR systems in different en- ment of new technologies, revolutionary changes for the IR vironments such as mobile search, image search [45], architecture are necessary. product search, job and talent search. Users’ search intents, strategies, and behavior may change dramatically in different 7.2 Proposed Research Topics search environments. Therefore, it is crucial to analyze and understand users using different IR systems in different Information retrieval framework for fully decentralized net- environments. work Front. Comput. Sci. 9

Current Web information retrieval mainly relies on super The fusion mechanism of retrieval results has always been large search engines such as Baidu, Google, and Bing, etc. a critical issue in distributed IR systems. On a decentral- In a fully decentralized network, each user may build and ized network, the existence of massive retrieval units and maintain a search engine and provide search services in a dis- numerous distributed heterogeneous resources will result tributed cooperation environment. The capability of users to in more critical challenges to traditional solutions. How obtain and retrieve information depends on their credit and to select and integrate the most useful information from authority. massive search engine units is a crucial problem. The credit Existing Distributed Information Retrieval (DIR) methods of information sources will play a key role. It is necessary may not be suitable for distributed retrieval in the decentral- to study credit rewarding mechanism and develop credit ized network because most of the existing DIR methods are based fusion algorithms to reduce the cost and enhance the inherently centralized. Therefore, it is necessary to re-design efficiency of result fusion. a new distributed information retrieval framework for a fully decentralized Web, in which the retrieval requests can be quickly and accurately responded wherever the search 7.3 Research Challenges request comes from, as long as there is enough credit and permission. Due to the rapid development of the distributed Web, and the potential changes caused by new technologies, the IR frame- Information collection on complex network structure work will face a wide range of challenges. With the development of next generation network tech- (1) More and more data structures are appearing in the We- nologies, the network structure will inevitably be more b. How to define a unified data structure is a critical problem complex and divergent. The network structure significantly for content-based addressing. Moreover, information will be influences the efficiency and effectiveness of information historic versioning, allowing multiple nodes to manipulate d- collection. Another factor that affects the collection of ifferent versions of the content. It is a great challenge to de- information is the way information is provided and shared. sign content-based addressing to support unified information The development of social network will essentially change representation of different data formats and support multiple the way of traditional information crawling. It is necessary versions’ retrieval. to study the complex network structure of future Web, and (2) New retrieval framework must support consensus deeply harness the relationship between network structure mechanism in the fully distributed and decentralized settings and distribution characteristics of information resources, (such as blockchain and its service of transaction). However, to enable the efficient access to high-quality information current consensus mechanism of blockchain is highly depen- collection. dent on computing resources and energy consumption, and it is challenging to develop new consensus mechanism which Distributed indexing and exchanging mechanism can support fast and efficient retrieval. Future information resources will probably be content- (3) In a distributed environment, the cooperative ways of based addressed, each file (content) has the uniqueness of distributed nodes are very important. How to establish a well- existence. When a file is added to Web, a unique encrypted defined credit system and apply credit to reliable and trust- hash value is assigned to the content based on calculation. worthy exchange of original information or indexing is a ma- Such mechanism will change the way that using domain jor challenge. names to access web content, and present challenges to (4) Due to the numerous search engine nodes and a huge the existing inverted indexing mechanism. It is necessary number of documents, the storage architecture of indexing to design efficient distributed indexing mechanism that should be carefully re-designed. It is a significant challenge facilitates various content representations, and meanwhile to design an efficient distributed indexing architecture for efficiently supports the sharing and exchange of distributed content representation, and exchange protocols of index files information with permissions under a well-established credit among distributed search engines under privacy protection system. for highly efficient and secure retrieval. (5) A huge number of search engines in decentralized dis- Search results fusion in distributed heterogeneous environ- tributed Web will exist. How to quickly locate related search ment engines based on content-based addressing and efficiently re- Zhumin CHEN et al: Information Retrieval: A View from the Chinese IR Community 10 spond to users with precise fusion results will be a major chal- defined. It becomes a challenging task to evaluate whether lenge. the retrieved documents, images, point-of-interests or even different kinds of sensors satisfy these needs by combining traditional notions of relevance and spatial and temporal 8 New Evaluation Methodology information.

8.1 Motivation Understand the general Web search users Evaluation has a long history in the area of IR [13]. The basic The second research area is related to how to understand IR evaluation methods are based on the test collections shared the general Web search users [24]. Existing evaluation by researches, which contain a corpus, queries, and relevance methods are usually focused on the experienced users with assessments. In recent decades, researchers have explored clearly defined tasks. However, there are many different various strategies to evaluate search performance [47]. kinds of Web search users. Modeling users can be achieved Previous evaluation methods can be roughly divided into from the characters of users and the aims of users on the three types: (1) The basic evaluation methods usually try to topic, using different kinds of resources. Hence, how to directly measure returned relevant information objects of the model the differences of users and how to integrate the uses’ system. (2) Large-scale log studies have also been proposed modelling into the evaluation metric is an important and and used for this task at search engine companies. With the challenge research problem. collected retrieval logs, researchers can observe interests and information needs of searchers. (3) Researchers defined and Crowdsourcing based evaluation prescribed a small number of topics. Then, users are asked to The third research area is how to use crowdsourcing plat- find information of these topics. They may also be required form to do the evaluation. Traditional interactive evaluation to provide feedback via questionnaires. method needs laboratory for users to evaluate the systems. Although previous methods have been successfully used Hence, it is a time consuming and expensive task, and cannot for evaluating IR tasks, there are several issues which we be used for large-scale data sets. Crowdsourcing platforms need to pay attention to. Firstly, the information needs of provide an opportunity to achieve the problem. A large users in existing evaluation models are less considered. Al- number of users may participate in the evaluation. However, though questionnaires-based methods can provide some in- the quality of evaluation from crowdsourcing platforms formation about the actions of users, they cannot be used for is usually lower than on site evaluation. How to design large scale evaluation. Secondly, the dynamic nature of Web and provide description information for ordinary users is a may highly impact the retrieval objective over time. Thirdly, challenge problem. there are many complex IR tasks which do not have stable and definite end points. Hence, the aim of this proposed re- Evaluation dataset construction search area is to combine the advantages of these two kinds of The fourth research area is related to how to construct methods to provide new effective methods for IR evaluation. datasets used for evaluation. The datasets should contain documents, information seeking requirements, and golden 8.2 Proposed research standards. Previous methods usually constructed static dataset to facilitate evaluation. Different types of documents The proposed research can be divided into four major areas: and different kinds of queries should be incorporated into the (1) evaluation of spatial and temporal aware multi-recourse dataset. Moreover, a novel direction is to construct datasets retrieval, (2) modeling the users, (3) crowdsourcing evalu- that can be dynamically changed for realistic evaluation. ation methods, and (4) dynamic datasets for complexity tasks.

Evaluation of spatial and temporal aware retrieval systems 8.3 Research Challenges The first research area is related to how to evaluate the performance of spatial and temporal aware retrieval. Due to Evaluating IR systems need participation of different kinds of the continuous increase of mobile search, the objects of users users. However, users are difficult to measure. Peoples with may dynamically change in different location at different different backgrounds, cultures, languages may provide dif- time. The retrieval needs are complex and cannot be clearly ferent feedback for the same thing. Even a single person un- Front. Comput. Sci. 11 der different circumstance may give different result. Hence, 9.1.2 Proposed Research Topics how to model user is an important task and is also one of the Leveraging Heterogeneous Information for Explainable In- most important challenges in this task. This kind of informa- formation Retrieval tion can be represented by user models. How to obtain large-scale resources for academic com- Modern IR systems not only deal with textual documents, munity is another challenge for this task. Potential dataset- but also a lot of heterogeneous multi-modal information s should contain millions of searching records by thousands sources [40]. For example, Web search engines have access of users. From laboratory, the interviews of users and videos to documents, images, videos, audios as candidate results for recordings of screens may also provide valuable information queries; e-commerce recommendation system works on us- for this task. However, these datasets are expensive to con- er numerical ratings, textual reviews, product images, demo- struct and the privacy protection is also an important issue for graphic information, etc., for user personalization and recom- publicly sharing these dataset. mendation; and social networks leverage user social relation- s and contextual information such as time and location for search and recommendation. Current systems mostly leverage heterogeneous infor- 9 Other Research Topics mation sources to improve search and recommendation 9.1 Explainable Information Retrieval performance. A lot of research efforts are needed regarding how to jointly leverage heterogeneous information sources 9.1.1 Motivation for explainable IR, including research tasks such as multi- For a long time period IR systems mostly focus on finding rel- modal explanation based on aligning two or more different evant results as efficiently and effectively as possible. How- information sources, transfer learning over heterogeneous ever, the explainability of IR systems were largely neglect- information sources for explainable IR, cross-domain expla- ed [14, 46]. The lack of explainability mainly exists in terms nation in IR systems, and so on. of two perspectives, 1) the outputs of the IR systems (i.e., search or recommendation results) are presented to end users Personalization in Explainable Information Retrieval without explanations, and 2) the inner mechanisms of the IR To improve the persuasiveness, trustworthiness, trans- systems (i.e., search or recommendation algorithms) are gray parency and effectiveness of explainable information or black boxes to system designers. retrieval systems, explanations should be personalized for This lack of explainability for IR systems leads to major different users. Currently, most of explanations used in problems in practice [42]. Without making the users aware search and recommendation are generated based on data of why certain results are provided, IR systems may lose its mining techniques such as frequent pattern mining and reliability and become less effective in making the users trust association rule mining. For example, in e-commerce, the the results. More importantly, many IR systems nowadays most commonly used explanation is “a certain percentage are not only useful for information seeking, but also useful of users who bought this also bought that” [27], and in for complicated decision making by providing supportive in- social networks the explanation “a certain percentage of your formation and evidences. For example, medical workers need friends also viewed” is generated based on graph mining retrieve comprehensive healthcare documents to make med- algorithms [22]. These explanations are not closely coupled ical diagnosis [28]. In these critical decision-making tasks, with the user’s personalized preferences, and they are also explainability of the IR systems are vital so that users can un- not necessarily related to the IR models that generate the derstand why a particular result is provided and how to take search/recommendation results. As a result, more research advantage of the result for taking actions. efforts are needed to explore personalized explainable IR Recently, deep neural models have been widely used in IR algorithms and systems. systems [18,19,21,31,50]. Though researchers have achieved notable success in neural IR systems, the complexity and the Fusion of Explanations from Different Models lack of explainability of neural models further emphasize the Different explainable IR models may generate different importance of the research of explainable IR. To bring ex- explanations. We usually have to design different explainable plainability to IR systems, there is a wide range of research models to generate different explanations for different pur- topics for the community to address in the coming years. poses, and the explanations may not be logically consistent. Zhumin CHEN et al: Information Retrieval: A View from the Chinese IR Community 12

When the system generates a lot of candidate explanations 9.2 Ubiquitous Information Retrieval for a search or recommendation result, a great challenge is 9.2.1 Motivation how to select the best combined explanations to display in a limited space, and how to fuse different explanations into With the rapid development of IoT (Internet of Things), users a logically consistent unified format. Solving this problem are able to address their information needs anywhere at any- requires extensive efforts to integrate statistical and logical time for anything [17]. Traditional search engines based on machine learning approaches, and to bring in a certain ability desktop computers are therefore facing both challenges and of logical inference to explain the results. opportunities. Imagine that one day you wake up at home, your personal assistant (like Amazon echo) will tell you the Evaluation of Explainable Information Retrieval Systems most important news today. Then you may drive to work, and Evaluation of explainable IR systems remains an the car will automatically find the most convenient route to y- important problem. Explainable IR systems can be read- our office. The very powerful search engine will greatly boost ily evaluated with traditional IR measures to test its your confidence at work. These are not just imagination, but search/recommendation performance. To evaluate the what is going on in our daily life. Almost every application is explanation performance, a currently reliable protocol is constructed based on search. We believe that in the next few to test explainable vs. non-explainable IR models based years, ubiquitous search will be one of the most promising on real-world user study, such as A/B testing in practical directions for IR community. systems or evaluation with online workers in M-Turk [6]. However, there is still a lack of offline measures to evaluate 9.2.2 Proposed Research Topics the explanation performance. Evaluation of explanations Search with ubiquitous devices is related to multiple perspectives of information systems, According to a report from Google in 2015, the search traf- including persuasiveness, effectiveness, efficiency, trans- fic from mobile devices has exceeded that from conventional parency, trustworthiness, user satisfaction, etc. Developing desktop devices. This massive shift in search scenario forces reliable and easily usable evaluation measures will save a lot both industry and academia to redesign existing technologies of efforts for offline evaluation of explainable IR systems. in the context of mobile search. Mobile search is different from desktop search in many aspects: 1. Users are searching for different things using mo- bile device, e.g. more for entertainment and image, but less 9.1.3 Research Challenges for business. 2. Most mobile devices are equipped with a Due to the increasing demand of explainability to support touchable screen, which enables a completely different way comprehensive decision-making tasks in information system- of interaction. Meanwhile, mobile devices present much less s, there are a lot of challenges and opportunities ahead. content at a time due to the limited screen size. Thus, users have to incur a higher interaction cost in order to access the (1) Due to the heterogeneous nature of the available data same amount of information. 3. Mobile devices also provide in many information systems, how to integrate information much more space for search. For example, it is much easier sources with various forms for explainable retrieval is a great for search engines to optimize search results with the geo- challenge. graphic location. Considering the differences between mo- (2) To support comprehensive decision-making tasks, ex- bile and desktop search, it is necessary to calibrate existing plainable retrieval models should have the ability to organize techniques to provide better search. Some important research different explanations in a logically consistent manner to bet- issues include ranking, presentation, personalization, and e- ter help decision makers. valuation. (3) Similar to personalized search and recommendation Search should not be limited to mobile phones or desktop tasks, the explanations should also be personalized and care- computers. It is possible to search with various intelligent fully tailored to different users to improve the effectiveness. devices. Users’ interactions may be performed in different (4) Reliable and general applicable offline evaluation mea- ways with different devices. For instance, with a smart sures and protocols will help evaluate explainable retrieval speaker, users may ask some questions. The results might systems more efficiently. be retrieved from a knowledge base and will be presented Front. Comput. Sci. 13 with voice. The response should be as accurate as pos- scenarios needs to handle a lot of users’ private data, for ex- sible since users can hardly select a relevant result from ample, email, photo, notes. These private data may be stored multiple ones as the current desktop search engines. When locally, and search engines can only apply their retrieval al- applying IR application to a new device, the technique gorithms without knowing the exact content. should be carefully designed and evaluated adaptively. It is also possible that the future search on various devices will 9.2.3 Research Challenges be unified by something like Apple Siri or Microsoft Cortana. We believe that in ubiquitous search, there are a few impor- Search for ubiquitous data tant research challenges as follows: Traditional search engines originate from library search • For various search devices, it is important to extend exist- and mainly focus on textual search (e.g., Web search). Mod- ing “keyword - results” paradigm to various forms. For ern search engines can automatically identify the specific example, users should be able to search with voice, im- kind of resource that a query may reflect (like maroon 5 for ages, or videos. music) and integrate these results into search result pages, • For heterogeneous data, the central problem is to build which are referred to as vertical results. Although existing a cross-modal representation to support search tasks. researches have already invested a lot to support search a- Though existing studies have already made the first step- mong multi-modal and multi-source data, we believe that the s, there are still a lot of problems in this field. ability of searching for ubiquitous data is still far below our • For ubiquitous scenarios, the privacy preserved search may expectation. raise research challenges in a number of domains, such In the near future, query is not restricted to textual content. as information retrieval and network security. Some intents can be described by language while some others are not. For example, it is difficult to describe the shape of To summarize, we think that in the next few years, the maple leaf, or the smell of grass. Search for ubiquitous data edge between different modals will dissolve. It is promising is facing the following important problems: to develop ubiquitous information retrieval techniques to help 1. Intent Understanding. Intent understanding is always people collect information effectively and efficiently. the bottleneck of search techniques. Search engines need to identify the specific kind of data (such as images, video, and etc.) that users may want to find. 2. Cross-modal Representation. To support cross-modal 10 Further Suggestions search, data resource, as well as users’ queries need to be 10.1 Education of IR represented in a unified semantic space. 3. Whole-page Optimization. By far most of existing IR is an important course for students in the majors of cy- ranking models are explicitly or implicitly based on PRP berspace security, computer science, information science, etc. framework, i.e. ranking the documents by their relevance The research scope of IR includes large scale content crawl- probabilities. With more and more vertical results embedded ing, analyzing, organizing and accessing. It also includes the in the search result page (SERP), it is a necessity to optimize use of natural language processing (NLP), machine learning the utility of the entire SERP, rather than the utility of each (ML) and data mining techniques for content processing. The individual result. recent advances and trends for IR focus on the combination of other technologies. On one hand, through the learning of Search in ubiquitous scenarios IR course, students are able to understand the fundamental We believe with explosive growth of data, IR needs to theories, models and algorithms of IR, which can be the basis provide support in more and more scenarios. As we stated for future research. On the other hand, through the practice, before, smart search engines will be incorporated into more reading classic and advanced research papers, the research a- mobile devices, such as cellphone, car, even television and bility of students can be trained for future work on intelligent refrigerator at home. Privacy is another important issue in information processing, big data analyzing and processing or ubiquitous IR research. On the one hand, everything gener- practical work and further enrich the professionals in IR. ated by users (including queries, viewed content, etc.) may According to the recent progresses of IR, the content of contain users’ privacy. On the other hand, search in various IR course should also include: the fundamental concepts and Zhumin CHEN et al: Information Retrieval: A View from the Chinese IR Community 14 knowledge of IR, the combination of NLP, ML and IR, mul- on the APIs, overcoming the limits of the existing infrastruc- timedia retrieval and the applications of IR. ture. Most importantly, we should setup some incentive poli- For teaching skills, IR course emphasizes the fundamental cies to attract the volunteers who can share the costs of pro- and advanced knowledge. Teachers are required to the ducing such a powerful open source search system. basic knowledge and classic methods as well as introduce ad- Usually the search engines needed by the middle and vanced development of IR. They should also leverage diverse small-size enterprises are vertical search engines or the teaching methods, such as discussion in class, recommend- search engines with special characteristics. If we want to set ing further reading of textbook and conference proceedings, up a search engine platform, which can help the middle and training the presentation skills of students and synchronizing small-size enterprises to develop their search engines, the d- with the international level. IR is a course that has the as- ifferent soft modules of the platform should be developed as pects of practice and application. Experiments are especially independently as possible, and the platform should be tai- important, and we call for more collaborations with industrial lorable. Besides, it is better to provide some helpful tools people in teaching for better effect. and platforms for the volunteers to contribute their APIs and the forum for exchange their experience and expertise. 10.2 Open Source Communities Currently one of the major challenges in the research of IR is the lack of data, especially the user behavior data. The IR Open source platforms often play a crucial role in the devel- community should collect data and release them for the open opment of new technologies. For example, the open source research purpose. The data can be either contributed by the platforms for deep learning (e.g., Tensorflow [1], Caffe [23], community members or collected by the open IR platform. Pytorch [35], etc.) simplify the building of complex deep learning models, which makes deep learning accessible to everyone. IR needs such platforms too. Lucene [30] is an 11 Conclusions excellent project for IR application. However, it no longer meets the needs of recent IR researches. Especially, when Information retrieval remains a fundamental way for users we want to develop the search engines under new IR models, to explore the big data and to access information, facts, and e.g., the model for interactive information retrieval. knowledge. It is also evolving very fast, bringing in the new The new platforms should have some essential highlights possibilities in redefining users, information, retrieval, and e- to attract IR researchers and developers. First, the new plat- valuation. In this report, we have taken a serious look at these forms should be flexible to support new IR models or pattern- possibilities, and suggested many important research themes s. Second, the new platforms should take the recent advances for future study. We hope that this strategic report could in- in IR (e.g., deep learning-based IR especially) into account. spire our IR researchers in both academia and industry, and Third, the platform should be scalable, reliable and upgrad- could help the future growth of the IR research community. able to meet the demands of distributed computational envi- ronments. Clearly, the unification and standardization for develop- References ing new IR platforms should be discussed in depth first. To build such a platform, we first need to make a thorough in- 1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy vestigation of actual application and research requirements. Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irv- The investigation reports should include detailed requirement ing, Michael Isard, et al. Tensorflow: A system for large-scale machine analysis about what the researchers need to do their experi- learning. In 12th {USENIX} Symposium on Operating Systems Design ments and what the developers need to build their systems. and Implementation ({OSDI} 16), pages 265–283, 2016. 2. Eugene Agichtein, Eric Brill, and Susan Dumais. Improving web Then, we can define flexible architectures and self-contained search ranking by incorporating user behavior information. In Pro- modules based on the investigation results. Especially, some ceedings of the 29th annual international ACM SIGIR conference common used parts should be provided independently. For on Research and development in information retrieval, pages 19–26. example, the soft modules for Web crawler, document index- ACM, 2006. ing, ranking and re-ranking, training module, etc. After that, 3. Ian F Akyildiz, Özgür B Akan, Chao Chen, Jian Fang, and Weilian we can provide well-defined APIs so that the IR community Su. Interplanetary internet: state-of-the-art and research challenges. can bring more and more new models to the platform based Computer Networks, 43(2):75–112, 2003. Front. Comput. Sci. 15

4. Pia Borlund. The iir evaluation model: a framework for evaluation into neural ranking models for information retrieval. arXiv preprint of interactive information retrieval systems. Information Research. An arXiv:1903.06902, 2019. International Electronic Journal, 8(3), 2003. 20. Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mo- 5. Vannevar Bush et al. As we may think. The atlantic monthly, hamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick 176(1):101–108, 1945. Nguyen, Brian Kingsbury, et al. Deep neural networks for acoustic 6. Chris Callison-Burch. Fast, cheap, and creative: evaluating translation modeling in speech recognition. IEEE Signal processing magazine, quality using amazon’s mechanical turk. In Proceedings of the 2009 29, 2012. Conference on Empirical Methods in Natural Language Processing: 21. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Volume 1-Volume 1, pages 286–295. Association for Computational Larry Heck. Learning deep structured semantic models for web search Linguistics, 2009. using clickthrough data. In Proceedings of the 22nd ACM international 7. Surajit Chaudhuri and Umeshwar Dayal. An overview of data ware- conference on Information & Knowledge Management, pages 2333– housing and olap technology. ACM Sigmod record, 26(1):65–74, 1997. 2338. ACM, 2013. 8. CL Philip Chen and Chun-Yang Zhang. Data-intensive applications, 22. Mohsen Jamali and Martin Ester. A matrix factorization technique with challenges, techniques and technologies: A survey on big data. Infor- trust propagation for recommendation in social networks. In Proceed- mation sciences, 275:314–347, 2014. ings of the fourth ACM conference on Recommender systems, pages 9. Charles Clarke. From the chair... ACM SIGIR Forum, 50(1):1, 2016. 135–142. ACM, 2010. 10. Nick Craswell, W Bruce Croft, Maarten de Rijke, Jiafeng Guo, and 23. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Bhaskar Mitra. Sigir 2017 workshop on neural information retrieval Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Dar- (neu-ir’17). In Proceedings of the 40th International ACM SIGIR Con- rell. Caffe: Convolutional architecture for fast feature embedding. In ference on Research and Development in Information Retrieval, pages Proceedings of the 22nd ACM international conference on Multimedia, 1431–1432. ACM, 2017. pages 675–678. ACM, 2014. 11. Nick Craswell, W Bruce Croft, Jiafeng Guo, Bhaskar Mitra, and 24. Diane Kelly et al. Methods for evaluating interactive information re- Maarten de Rijke. Neu-ir: The sigir 2016 workshop on neural in- trieval systems with users. Foundations and Trends R in Information formation retrieval. In Proceedings of the 39th International ACM Retrieval, 3(1–2):1–224, 2009. SIGIR conference on Research and Development in Information Re- 25. Jon M Kleinberg. Authoritative sources in a hyperlinked environment. trieval, pages 1245–1246. ACM, 2016. Journal of the ACM (JACM), 46(5):604–632, 1999. 12. W Bruce Croft, Stephen Cronen-Townsend, and Victor Lavrenko. Rel- 26. Yann LeCun, Yoshua Bengio, et al. Convolutional networks for im- evance feedback and personalization: A language modeling perspec- ages, speech, and time series. The handbook of brain theory and neural tive. In DELOS. Citeseer, 2001. networks, 3361(10):1995, 1995. 13. W Bruce Croft, Donald Metzler, and Trevor Strohman. Search en- 27. Dongwon Lee, Jinsoo Park, and Joong-Ho Ahn. On the explanation gines: Information retrieval in practice, volume 520. Addison-Wesley of factors affecting e-commerce adoption. ICIS 2001 Proceedings, Reading, 2010. page 14, 2001. 14. David Ellis. Theory and explanation in information retrieval research. 28. Gang Luo, Chunqiang Tang, Hao Yang, and Xing Wei. Medsearch: Journal of Information Science, 8(1):25–38, 1984. a specialized search engine for medical information retrieval. In Pro- 15. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David ceedings of the 17th ACM conference on Information and knowledge Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. management, pages 143–152. ACM, 2008. Generative adversarial nets. In Advances in neural information pro- 29. Yuanhua Lv and ChengXiang Zhai. Positional language models for in- cessing systems, pages 2672–2680, 2014. formation retrieval. In Proceedings of the 32nd international ACM SI- 16. Laura A Granka, Thorsten Joachims, and Geri Gay. Eye-tracking anal- GIR conference on Research and development in information retrieval, ysis of user behavior in www search. In Proceedings of the 27th annual pages 299–306. ACM, 2009. international ACM SIGIR conference on Research and development in 30. Michael McCandless, Erik Hatcher, and Otis Gospodnetic. Lucene in information retrieval, pages 478–479. ACM, 2004. action: covers Apache Lucene 3.0. Manning Publications Co., 2010. 17. Jayavardhana Gubbi, Rajkumar Buyya, Slaven Marusic, and 31. Bhaskar Mitra and Nick Craswell. Neural models for information re- Marimuthu Palaniswami. Internet of things (iot): A vision, architec- trieval. arXiv preprint arXiv:1705.01509, 2017. tural elements, and future directions. Future generation computer sys- 32. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, tems, 29(7):1645–1660, 2013. Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, An- 18. Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. A deep dreas K Fidjeland, Georg Ostrovski, et al. Human-level control through relevance matching model for ad-hoc retrieval. In Proceedings of the deep reinforcement learning. Nature, 518(7540):529, 2015. 25th ACM International on Conference on Information and Knowledge 33. Meredith Ringel Morris, Jaime Teevan, and Katrina Panovich. What Management, pages 55–64. ACM, 2016. do people ask their social networks, and why?: a survey study of status 19. Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed message q&a behavior. In Proceedings of the SIGCHI conference on Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. A deep look Human factors in computing systems, pages 1739–1748. ACM, 2010. Zhumin CHEN et al: Information Retrieval: A View from the Chinese IR Community 16

34. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 44. Melanie Swan. Blockchain: Blueprint for a new economy. " O’Reilly The pagerank citation ranking: Bringing order to the web. Technical Media, Inc.", 2015. report, Stanford InfoLab, 1999. 45. Bart Thomee and Michael S Lew. Interactive search in image retrieval: 35. Adam Paszke, Sam Gross, Soumith Chintala, and Gregory Chanan. a survey. International Journal of Multimedia Information Retrieval, Pytorch: Tensors and dynamic neural networks in python with strong 1(2):71–86, 2012. gpu acceleration. PyTorch: Tensors and dynamic neural networks in 46. Pertti Vakkari and Kalervo Järvelin. Explanation in information seek- Python with strong GPU acceleration, 2017. ing and retrieval. In New directions in cognitive information retrieval, 36. Stephen Robertson, Hugo Zaragoza, et al. The probabilistic relevance pages 113–138. Springer, 2005. framework: Bm25 and beyond. Foundations and Trends R in Informa- 47. Ellen M Voorhees, Donna K Harman, et al. TREC: Experiment and tion Retrieval, 3(4):333–389, 2009. evaluation in information retrieval, volume 63. MIT press Cambridge, 37. Alan Said, Brijnesh J Jain, Sascha Narr, and Till Plumbaum. Users and 2005. noise: The magic barrier of recommender systems. In International 48. Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Beny- Conference on User Modeling, Adaptation, and Personalization, pages ou Wang, Peng Zhang, and Dell Zhang. Irgan: A minimax game for 237–248. Springer, 2012. unifying generative and discriminative information retrieval models. 38. Gerard Salton, Anita Wong, and Chung-Shu Yang. A vector s- In Proceedings of the 40th International ACM SIGIR conference on pace model for automatic indexing. Communications of the ACM, Research and Development in Information Retrieval, pages 515–524. 18(11):613–620, 1975. ACM, 2017. 39. Mark Sanderson and W Bruce Croft. The history of information re- 49. Chengxiang Zhai and John Lafferty. A study of smoothing methods trieval research. Proceedings of the IEEE, 100(Special Centennial for language models applied to ad hoc information retrieval. In ACM Issue):1444–1451, 2012. SIGIR Forum, volume 51, pages 268–276. ACM, 2017. 40. Dipanshu Sharma, Sunil Kumar, and Chandra Kholia. Multi-modal 50. Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, information retrieval system, May 30 2006. US Patent 7,054,818. Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Ed- 41. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis ward Banner, Vivek Khetan, et al. Neural information retrieval: A Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, literature review. arXiv preprint arXiv:1611.06792, 2016. Matthew Lai, Adrian Bolton, et al. Mastering the game of go without 51. Justin Zobel and Alistair Moffat. Inverted files for text search engines. human knowledge. Nature, 550(7676):354, 2017. ACM computing surveys (CSUR), 38(2):6, 2006. 42. Jaspreet Singh and Avishek Anand. Exs: Explainable search using local model agnostic interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages Please provide each author’s biography 770–773. ACM, 2019. here with no more than 120 words. The 43. Richard Socher, Eric H Huang, Jeffrey Pennin, Christopher D Man- photo can be informal. Our journal ning, and Andrew Y Ng. Dynamic pooling and unfolding recursive prefers to exhibit an encouraging atmo- autoencoders for paraphrase detection. In Advances in neural informa- sphere. Please use a one that best suits tion processing systems, pages 801–809, 2011. our journal.