Opus Exploring Publication Data Through Visualizations

Almaha Adnan Almalki Bachelor in Computer and Information Sciences King Saud University, 2014

Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning, in Partial Fulfillment of the Requirements for the Degree of Master of Science in Media Arts and Sciences at the Massachusetts Institute of Technology

June 2018 @Massachusetts Institute of Technology, 2018. All rights reserved.

Author Signature redacted

Almaha Adnan Almalki Program in Media Arts and Sciences May 4th, 2018 Massachusetts Institute of Technology Signature redacted Certified By

Prof. Cesar A. Hidalgo Associate Professor in Media Arts and Sciences Massachusetts Institute of Technology

Accepted By Signature redacted

Prof. Todd Machover Academic Head, Associate Professor of Media Arts and Sciences Massachusetts Institute of Technology

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

JUN 2 7 2018 LIBRARIES ARCHIVES Opus

Exploring Publication Data Through Visualizations

Almaha Adnan Almalki

Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning, in Partial Fulfillment of the Requirements for the Degree of Master of Science in Media Arts and Sciences at the Massachusetts Institute of Technology

Abstract

Scientific managers need to understand the impact of the research they support since they are required to evaluate re- searchers and their work for funding and promotional purposes. Yet, most of the online tools available to explore publication data, such as Google Scholar (GS), Microsoft Academic Search (MAS), and Scopus, present tabular views of publication data that fail to put scholars in a social, institutional, and geographic context. Moreover, these tools fail to provide aggregate views of the data for countries, organizations, and journals. Here, we introduce Opus, an interactive online platform that integrates, aggregates, and visualizes publication data from GS to present users with publication data at four different scales (e.g., scholars, countries, organizations, and journals). At each scale, Opus provides benchmarked visualizations that facilitate understanding the work of scholars in a social, generational, geographic, and institutional context. We conducted two user studies with a small group of potential users that show supporting evidence for the benefits of our approach. This design study contributes to the relatively unexplored but promising area of using information to explore publication data. Opus Exploring Publication Data Through Visualizations

Almaha Adnan Almalki

Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning, On May 5th, 2018 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Media Arts and Sciences at the Massachusetts Institute of Technology

Signature redacted Advisor

Prof. Cesar A. Hidalgo Associate Professor in Media Arts and Sciences Massachusetts Institute of Technology Signature redacted Reader

rof. Iyad Rahwan Associa e Professor i edia Arts and Sciences Massachusetts Institute of Technology Signature redacted Reader

Dr. Rahul Bhargava Research Scientist in Media Arts and Sciences Massachusetts Institute of Technology Acknowledgments

Cesar Hidalgo my advisor thank you for your endless encouragement, feedback, and support throughout this entire process.

Iyad Rahwan and Rahul Bhargava, my readers, thank you for your support and feedback and for inspiring me over the past year.

To my Media Lab friends and Collective learning group, I am grateful for all of your advice and most importantly your friendship. You have provided me extensive personal and professional guidance and taught me a great deal about both scientific research and life in general. I am fortunate to be a part of such a talented, creative, and intellectually promiscuous society.

Finally, I must express my very profound gratitude to my family, my best friend Lamia, and A for providing me with endless support and continuous encouragement throughout my years of study. This accomplishment would not have been possible without them. Thank you. Contents

Abstract 1

1 Introduction 1

2 Literature Review 2

3 System 2 D ata ...... 3 Data Structure ...... 3 D esign ...... 3 Features ...... 4 Scholar ...... 5 Country and Organization...... 6 Journal ...... 6

4 Technical Implementation 6 R eplot ...... 6

5 User Study 7 Study 1 ...... 7 Participants ...... 7 Task and Procedure ...... 7 R esults ...... 7 Study 2 ...... 8 Participants ...... 8 Task and Procedure ...... 8 R esults ...... 8

6 Conclusion 9

References 9 Opus: Exploring Publication Data Through Visualizations

Almaha Almalki Cesar Hidalgo MIT Media Lab MIT Media Lab Cambridge, Massachusetts Cambridge, Massachusetts [email protected] [email protected]

ABSTRACT the generation and spread of knowledge, manag- Scientific managers need to understand the impact of the re- ing publication data across geographic and institu- search they support since they are required to evaluate re- tional boundaries has become increasingly complex searchers and their work for funding and promotional pur- [31]. To search and track the academic literature, poses. Yet, most of the online tools available to explore publi- for-profit and nonprofit organizations have built soft- cation data, such as Google Scholar (GS), Microsoft Academic ware solutions to help organize scholarly publica- Search (MAS), and Scopus, present tabular views of publi- tion data. Platforms such as GS, MAS, and Scopus cation data that fail to put scholars in a social, institutional, are a few examples of the many ways in which and geographic context. Moreover, these tools fail to provide scholars, administrators, and programmer managers aggregate views of the data for countries, organizations, and access scholarly data to understand the impact of journals. Here, we introduce Opus, an interactive online plat- the research they produce and support. form that integrates, aggregates, and visualizes publication These platforms are popular places for scholars and data from GS to present users with publication data at four administrators to find publications and learn about different scales (e.g., scholars, countries, organizations, and the academic career of scholars. But since the usual journals). At each scale, Opus provides benchmarked visual- format in which the data is presented is through a izations that facilitate understanding the work of scholars in tabular view, the functionality of these platforms a social, generational, geographic, and institutional context. could be extended by providing benchmarked vi- We conducted two user studies with a small group of potential sualizations that acknowledge the social and insti- users that show supporting evidence for the benefits of our tutional nature of science. In fact, the use of visu- approach. This design study contributes to the relatively unex- alizations is known to enhance the ability of users plored but promising area of using information visualization to understand complex datasets, such as those in- to explore publication data. volved in the overwhelming velocity and volume of publication data [20, 7, 4]. Author Keywords ; Publication Data; Storytelling; Here, we introduce Opus, an interactive online plat- User Interface form that integrates, aggregates, and visualizes pub- lication data from GS. Opus presents users with INTRODUCTION publication data at four different scales: scholars, countries, organizations, and journals. Opus pro- Academic research is a global, interdisciplinary, and vides information about scholars' perceived level of collaborative endeavor. While the international and experience and proficiency in social, generational, interdisciplinary nature of research contributes to and geographic perspectives. Opus also provides vi- sualizations that show countries' and organizations' Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed scientific outputs across time and their international for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM research collaboration. For journals, Opus provides must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a functionality to trace their impact and publication fee. Request permissions from [email protected]. activity over time, including their geographical di- versity. We conducted two user studies with two D 2019 ACM. ISBN 978-1-4503-2138-9. DOI: 16. 1145/1235 I groups of potential users (scientific manager and type of medium is seen by thousands and shared academic researcher). In the first study, we tested across multiple social media platforms, which has Opus' performance against GS; we concluded that led to creating new opportunities to apply visualiza- the use of visualizations and a simple layout helped tions in data-rich domains such as academia (e.g., potential users to retrieve information faster and publication data). Having such an ecosystem accel- more accurately. In terms of usability, learnability, erated the use of visualizations for storytelling and and ease of use, we conducted a second user study, allowed us to create both author- and reader-driven and Opus' simple layout design and clear visualiza- experiences by facilitating web-based graphical and tions performed well. interactive elements. Platforms such as the Obser- vatory of Economic Complexity [25] and DataUSA LITERATURE REVIEW [10] have innovated the way we structure our nar- Data visualization is a powerful technique to encode ratives by using visualization with a high degree information using visual representations to discover of interactivity so that users can ask new questions essential insights from vast complex datasets. Vi- and discover alternative explanations rather than sualizations allow us to quickly record and analyze following the sequential structure of the traditional data visually to find patterns and communicate these narrative.These platforms were developed in our findings in a less cognitive effort [2]. The use of group the Collective Learning at MIT Media Lab to visualizations naturally supports the essence of sto- enable users to use visualization platforms to create rytelling, which is structuring information to clarify their own stories. relationships and patterns. Reports also show that the use of visualizations has a potential catalytic ef- The use of visualizations to explore scientific data fect on storytelling [30]. Historically, visualizations can be considered limited. Few tools use visualiza- have been used to facilitate analysis and storytelling, tion as their primary medium to explore publication such as the famous rose graphs that showed the data. Tools such as GS and MAS have proven the causes of mortalities during the Crimean War. The lack of visualizations in their current design. On graphs were created in 1858 by the English, social the other hand, tools such as VOSviewer [28] fo- reformer and statistician to an- cus on using one type of visualization (networks) to alyze the causes of death in the British army which construct and visualize bibliometric datasets. These revealed that most of the British soldiers who died networks show journals, researchers, or individual during the Crimean War died of sickness rather than publications, and they can be generated based on ci- of wounds or other causes. Her use of visualizations tation, bibliographic coupling, co-citation, or coau- helped her to communicate her findings to lobby for thorship relations. Scopus also introduces the visual sanitary reform and improved conditions in hospi- analysis feature in its tool to provide a better picture tals. She stated the goal of her visualization was: "to of a scholar's publication history and influence. In affect thro' the Eyes what we fail to convey to the Opus, we focused on developing a tool that allows public through their word-proof ears."[5]. Her work analyses of complex data and the presentation of and others established the principles of a successful stories. By integrating a suite of interactive visual- data visualization, and these principles remain the izations and providing a diverse range of views to same despite the vast change in technology. allow the exploration of publication data, we create a vibrant ecosystem for users to engage more with In recent years we witnessed the rise of internet- publication data and expand the influence of the based visualizations, ranging from art projects (e.g., stories they create. the work of Jonathan Harris, who used a self- organizing particle visualization to explore human SYSTEM emotions, where each particle represents a single Opus is a web-based platform designed to allow emotion posted by a single individual on a global users to visually explore publication data. It pro- scale [14]) to powerful political stories in the New vides the power needed to promote additional in- York Times (Mapping Segregation [27]) [23]. This dicators, including collaboration indicators, so as

2 to change how researchers and scientific managers view, evaluate, and analyze scientific data. The platform allows users to explore the publication data on four different scales: scholars, countries, organizations, and journals. On each scale, a set of questions are answered by using suitable interac- tive visualizations. Unlike other research platforms, Opus is designed to help users explore the work and trajectory of scholars and to investigate their academic impact. It also allows one to discover a scholar's network of collaborators and identify his/her peers. Figure 1. Opus database structue

The platform also provides comprehensive profiles organizations, journals, and papers. Then we con- of countries, organizations, and journals to enable structed a graph data model to make querying the users to analyze nations', organizations', and jour- data more flexible and fine-grained since it allows nals' productivity over time and benchmark their access to the data from any possible point of view output against peers worldwide. [29]. After structuring and cleaning the data, Opus' final dataset contained 732,354 scholars, 247 coun- Data tries, 5,863 organizations, 49,665 journals, and 53M Opus uses the GS dataset; unfortunately, GS does publications. not provide an API for its data; therefore, HTML Data Structure pages were scraped to extract all available data up The Opus database is a graph data model that in- to early 2017. The reason for choosing GS as the cludes nodes with unique IDs and properties that primary data source is because it covers more pub- vary based on node type. For example, the scholar lication data than any other platform [19, 15]. An node contains properties such as scholar name, cita- earlier statistical estimate published in PLOS ONE tion count, publication count, etc. Between different using a mark and recapture method estimated that types of nodes, relations are defined; for GS instance, covers approximately 80-90% of all articles pub- the scholar node has a connection with the publica- lished in English with a rating of 100 million [15]. tion node established as Author-of, which links a GS data have the advantage of being disambiguated, scholar with all of his/her publications. As shown in but they are not free of limitations. One often- Figure 1, there are nine nodes and 15 relation types mentioned limitation is that GS indexes all types of between the nodes. publications, including informal and nonacademic documents. We tackle this limitation by only using Design The publications that matched the list of journals pro- interface design of Opus has evolved through numerous vided by SCImago (ultimately, Scopus). Also, GS iterations in terms of both and feature overestimates the number of citations per publica- specifications, as shown in Figure 2. tion [12]. However, it has also been demonstrated Opus follows an index page information architec- that some citations in GS at the article level correlate ture (IA) pattern that consists of homepage serves as a jump off with the Scopus database [18]. Despite these limi- point for all the other pages. In the initial tations, GS is a great dataset given the complexity proof of the concept, the Opus subpages in- cluded of disambiguating scholarly data at the individual four views (e.g., scholar, country, field, and level. visualization); however, as we evolved, we altered our sub-views to reflect the four main pieces of in- After scraping GS, we started disambiguating the formation found in any paper header (e.g., scholar, data into five different scales of scholars, countries, organization, country, and journal) (see Figure 2.2).

3 ure 3.A) at the top of the page allows users to move between the different views in Opus. We utilized the entire screen to only show one visualization at a I *, time and one legend at the center of the screen (Fig- ure 3.C). On the right side, we added a description about the shown visualization, external information, (1) (2) and any user control elements (Figure3.D). As we Figure 2. Opus design iterations: (1) Opus first design iteration, (2) Opus of the finalized design populated the pages, we discovered that one single page layout cons is that users could become overwhelmed while scrolling the page. This is why we added the fixed navigation (Figure 3.B) menu on the left side to perform as a spy scroll to help users explore our platform better and to encourage scrolling behavior As for the visualization, Opus currently contains seven different types of visualizations. We select visualizations based on the data story we wanted to investigate e (e.g., relationship, comparison, compo- Figure 3. Interface elements within the finalized layout of Opus, v1: (A) sition, or distribution) [24]. For example, we used Main navigation panel (B) Description and parameter selection (C) Vi- sualization panel (D) Index menu. the network to represent the scholar's coauthor net- work, since the network shows relationships as a precise combination of the nodes and links. Each We used this structure to allow users a diverse range visualization presents interactivity by either zoom- of lenses with which to explore the data across vari- ing, hovering, filtering, or clicking on to disclose ous dimensions. more information following 's tax- Highly integrated data can be difficult to display in onomy for information visualizations [24]. For the a simple layout. The Opus interface layout used to graphic design , we chose colors that are follow a navigation tabs layout, as shown in Fig- visually appealing to the users. ure 2.1. The navigation tabs layout allowed users to jump between different layers of information in Features the same page using tabs. After we tested that ap- Based on our user and task analysis, we developed a proach with potential users, we discovered that, by platform that uses visualizations as a medium to re- following such an approach, we were adding more veal trends to provide additional insights. We made complexity and that users were complaining about sure to involve interfaces that acknowledge the so- the number of tabs they needed to click on to dis- cial and institutional nature of science by adding the play information on a single page. After multiple countries', organizations', and journals' views. We iterations, we decided to follow a single-page layout also introduced metrics that can help understand the that featured multiple functions on a single page scholars' impact on a more narrow context (bench- and used scrolling controls to fit multiple contents marked by age and country) to help facilitate com- into a smaller area instead of hiding content behind parison. the tabs [9]. The single page layout removed all the Scholar clutter from the previous layout (see Figure 2.1.), Opus provides information about scholars in a so- leaving a clear and beautiful user interface with con- cial, generational, geographic, and institutional con- cise and focused content. text. First we presented the user with important The Opus system interface (see Figure 3) is divided numbers about the viewed scholar such as the num- into four sections. The fixed navigation panel (Fig- ber of publications and citations to get him/her fa-

4 A

B

C

D

Figure 4. Profile examples of (A:Scholar, B Country, C: Organization, D: Journal) in Opus

miliar with the scholar (see Figure 4.A.1). Users represents shared publications. The network also can interact with interactive visualizations to unlock provides information about the diversity of the coau- the scholar's academic career so as to understand thor network regarding the presented organizations his/her main domains through the use of a title cloud and fields. To explore the scholars' international visualization that uses the paper's title to extract the research collaboration, we tracked the number of most used words and a treemap that displays the publications coauthored between countries based on scholar's fields using Thomson Scientific's Essen- coauthor affiliation. We displayed such a relation tial Science Indicators (ESI) database for the field by using a treemap visualization to display schol- classification of journals [22]. We also displayed ars' collaborators based on organizations and a generational information about the scholar's aca- graph to display scholars' collaborators based on demic output in a simple bar that shows the countries. publication count over the years clustered by the The most used benchmark or metric to measure the fields and another horizontal bar chart to represent success of a scholar is the h-index [1]. However, it the scholar's publication trajectory based on his/her can be unfair when you start comparing all schol- citation count over time. ars without considering their academic age or fields Since research is now increasingly done in teams [I]. In Opus, we measured a scholar's success by across nearly all fields, available tools lack mea- comparing his/her citation count or h-index to other surements to quantify that collaboration [31]. Opus scholars that share the same academic age and visu- provides a way to explore scholars' social capital alized the output as a histogram (see Figure 4.A.3). by using an intractable network that presents their Academic age is the latest year that a scholar pub- coauthor network (Figure 4.A.2), where node size lished a paper in, with at least one citation, minus represents the numbrt of citation and the link weight the year of his/her first published paper. By adding

5 more controls, users will have a better understand- performance and the number of papers they publish ing of the scholar's performance beyond just a static compared to the citations they have received over number. the years. But, more importantly, users can also ex- plore the top organizations and countries published Country and Organization in a particular journal. In Figure 4.D.2, we show the Available tools such as GS and MAS tend to fo- top organizations published in Nature. This abstract cus only on individual-level outputs (presenting a information can be very beneficial for researchers scholar as an individual rather than a part of a coun- in deciding which journal they need to target. try or organization). In Opus, we extend GS and MAS work by introducing new scales (organiza- TECHNICAL IMPLEMENTATION tion and country) to explore publication data in a Opus architecture includes a low-level database generational, geographic, and institutional context. layer, a middle-level API layer, and a high-level Similar to the scholar profile, we start with a gen- user interface layer. The database layer consists eral number to showcase country or organization of a Neo4j-implemented graph database [26]. The performance in terms of publication, citation, and API layer queries data from the database layer us- the number of scholars. Opus then shows the scien- ing GraphiQL [13], and a Ruby connector is im- tific output of an entity presented in a simple line plemented to link both layers. When the API layer chart. Based on the publications of an entity, Opus queries data from the database, it calculates vari- uses a treemap to visualize the diversity of jour- ous agreements and visualization metrics and sends nals colored based on the impact factor based on the them to the interface layer. The user interfaces re- SCImago Journal's 2017 ranking [6]. Users can also quest data through the API layer upon user interac- explore who the top scholars of an organization are tion. Figure 5 shows an abstract view of the Opus in a country within a scatter , where the circle architecture. size represents the number of scholars who share The UI is implemented in the JavaScript React the same number of publications or citations based framework [11] to enable declaration of the user on their academic age. interfaces and to quickly model the state of those in- Opus also provides visualizations (treemap and a terfaces. It allows the use of reusable, composable, map) that show a country's or an organization's in- and stateful components, as well as the immediate ternational collaborators, as shown in Figure 4.B.2. state changes of a component, and it reflects these International collaboration creates a broader engage- changes in the user interface based on user interac- ment and status for an organization, helping it to at- tion. To visualize information in a treemap, network, tract more international students and faculties. Also, bar chart, scatter plot, and word cloud, various visu- such information is critical to consider if, for ex- alization libraries are used such as d3plus-react [17] ample, an agency wants to fund a research entity. and Replot [21], which is a visualization library that This information helps academic managers to trace was built in the course of developing Opus. the evolution of scientific knowledge at the scale of their country or organization and to understand the Replot growth of specific fields (see Figure 4.C. 1), scholars, Using React as a front-end framework provides a lot and publications. of flexibility; however, the problem appears when integrating React and D3 (data-driven documents) Journal seamlessly. Both libraries are built upon data-driven Opus also acknowledges the importance of visualiz- DOM manipulation. Without careful precautions, ing data related to journal performance. We wanted React will not accept any changes made to the DOM. to help researchers quickly find suitable journals One solution is creating a DOM node and passing it that they can be published in, based on, for example, to D3 on each render. It mutates the detached DOM fields, acceptance rate, and impact factor. This is and is then converted to React elements. However, why Opus provides visualizations to show journals' this means building and mutating a full real DOM

6 professional degree, except three with a bachelor's U. degree. All participants were potential users of Opus for their current work as project managers at MIT International Science and Technology Initiatives. They were familiar with tools similar to Opus such as GS and used them on a regular basis to evaluate the scholars. However, none of them had seen or tried Opus before the study. They did not receive any compensation for participating in the study.

Task and Procedure Participants were asked to randomly select a num- ber from 1 to 10 to represent their user ID and to Figure 5. Opus system architecture. determine which tool they would use (Opus or GS). If participants picked an odd number, they would tree on every render and then removing it. This so- conduct the study using Opus and vice versa. The lution is incredibly inefficient because the DOM is study included two steps: gathering information and not lightweight. As a solution, we created our own participants filling out a survey to state their demo- native SVG visualizations for React called Replot. graphics. The second step was the evaluation step, Replot is built by only using React and SVG, allow- where all participants needed to answer the same ing the instant update of data and customization to six questions about a specific scholar and his/her almost every feature in a chart. Currently, Replot academic performance using the assigned tool. The includes a bar chart, line chart, scatter plot, box plot, study was conducted with all participants on one treemap, network, and map. occasion for one hour.

USER STUDY Results We conducted two user studies with the objective to The evaluation phase revealed that participants who assess the potential usability and usefulness of Opus used Opus (5 participants) were able to successfully in visualizing and analyzing publication data. The answer most of the questions accurately compared first study had participants answer questions about to the participants who used GS (4 participants). scholars and their perceived level of experience and 1 summarizes participants' task performance proficiency using Opus or GS. In the second study, in both tools. participants had to answer questions about scholars', For each evaluation question, there was only one par- countries', organizations', and journals' scientific ticipant who did not answer the question correctly performance over time by only using Opus. We when using Opus. This was compared to GS, where also gathered feedback about potential capabilities the average number of participants who answered to add to Opus. We used an approach that is appro- the questions correctly was less than half. We can priate for studying information visualization-based assume that the use of interactive visualizations to projects [8, 16, 3] that include four phases (e.g., tu- display complex information helped participants to torial, practice, evaluation, and collecting feedback). understand better. In our case, we only included the evaluation and gathering feedback phases. Regarding the time, GS and Opus had a similar du- ration, even though participants didn't explore Opus Study 1 before the study. This shows that Opus' simple page Participants layout helped participants to easily navigate the plat- Nine users (5 women and 4 men) participated in form without any significant struggles to find the the study. Participants' ages ranged from 25 to 44, right information. We believe that if we conducted and the majority of participants had a graduate or a study with the tutorial practice phase before the

7 Questions Google Scholar Opus Number Time in min Number Time in min Correct Correct How many co-authors does Cesar Hidalgo have? Which of his 0/4 first and 1.8 4/5 first part 3.3 co-authors has the highest citation count? part 2/4 sec- and 3/5 sec- ond part ond part Is Cesar Hidalgo's citation count above or below the average 2/4 2.1 5/5 1.3 considering his academic age? [the academic age is ... ] How many fields has Cesar Hidalgo published in? 4/4 0.66 5/5 0.41 Has Cesar Hidalgo collaborated with more organizations than 1/4 2.1 4/5 2.3 Iyad Rahwan? What is Cesar Hidalgo's highest cited paper? 4/4 0.47 4/5 1.08 Which year did Cesar Hidalgo publish the most number of 1/4 0.83 4/5 0.50 papers? [please exclude the year 2017 and 2018] Table 1. Results of task performance of the evaluation tasks

evaluation phase, users would be able to find an- swers faster than when using GS. After the study, it was clear that some information was not as easy to find as other information (e.g., most cited paper), and this helped us to further improve the design. toDetractor SPassive *Promoter

Study 2 Figure 6. The breakdown of Opus Net Score. Participants Seven users (5 women and 2 men) participated in the Results study. Participants were aged from 18 to 44, and the The overall assessment from the participants was majority of participants had a bachelor's, graduate, that Opus is a useful platform that helped them pub- or professional degree. The majority of participants lish data in an interesting way. were researchers at the MIT Media Lab, except one who was a relation coordinator and one who was As a part of the feedback survey, we asked partic- a program specialist. All of the participants were ipants to mention at least three aspects they liked potential users of Opus for their current work, and about Opus. Their answers can be clustered into they were familiar with tools similar to Opus such three main features: first, the simplicity of Opus' as GS. However, none of them had seen or tried layout and the appealing graphic design; second, the Opus before the study. Participants received a $10 clarity of visualization and visualization choices; Amazon gift card as a small gesture of our gratitude and, third, all participants liked how Opus displays for their Participation. data in four views (e.g., scholar, country, organiza- tion, and journal). Task and Procedure To measure the Opus net score, we asked partici- Participants were asked to randomly select a num- pants how likely it was that they would recommend ber from 1 to 10 to represent their user ID. The Opus to a friend or colleague. study included three steps: gathering information For the beta version, Opus received a 14.30; however, the net score of and participants filling out a survey to state their GS and MAS was not publicly available. Thus, it demographics. In the second evaluation, we asked is difficult to make sense of the number (see Figure participants to answer questions about academic per- 6). formance (e.g., scholar, country, organization, and journal). Finally, we asked participants to complete To measure Opus' usability, feasibility, and learn- a short 12-question survey about their impressions ability, we asked three questions (see Figure 7). All of the system. Each participant received an individ- seven participants found Opus UI to have a simple ual session of 30 minutes. design and to be intuitive and easy to use/learn. Four

8 that allow them to make sense of complex datasets Q1: Was it difficultto find the inf mtion you aeded InOp ? -... , and conduct competitively intelligent decisions. The contributions of this work include some design ideas that could be employed by others working on

03: Was Opus sauty to 1-ar haw to use? .4 .- . a similar dataset, even in different domains. Also, we proposed a new way to disambiguate such a 3. 4 dataset in four different scales (e.g., scholar, country, Figure 7. Results of of the post-study survey. organization, and journal) to explore the data with acknowledgment of the social nature of science. In out of seven participants particularly liked the fixed the course of developing Opus, we also developed a navigation menu, which helped them to explore the new visualization library, Replot, to help speed up page content smoothly. Other participants liked the the developing process of comparable tools using a consistency of the design across the multiple views. similar front-end framework. They also mentioned how the colors and type of Our design process and system evaluation uncov- visualization used were beautiful and easy to read. ered some promising directions for future research, Because of these design aspects, participants were a few of which were described in the previous sec- able to use the system and learn about the basic func- tion. Incorporating other sources of data such as tionalities quickly. Overall, Opus falls into the easy MAS and Scopus, etc., would provide further ana- to learn/use range (see Figure 7). Responses for the lytic power and could add a new scale to explore ease of use, learnability, and feasibility statements the data (e.g., paper view). The challenge in adding had a group mean value of 4.38 and 4.71 in question such additional data would be to do so in a manner 1, 4.30 in question 2, and 4.14 in question 3, on a that does not introduce significantly greater visual scale from 1 to 5, with 5 indicating extremely easy. complexity and make analytically more difficult. To expand Opus' functionality, we asked partici- As with any research, our study does have some pants to list features they they would like to see on limitations. A more extensive study with more par- the Opus platform. One participant wanted Opus' ticipants can help us better understand Opus' flaws. search engine to have filters to make the search We also feel there is a benefit to providing more easier. Another participant wanted to see the infor- capabilities to Opus (e.g., link publications to real mation presented in visualization as data tables by papers and add more ways to filter those data). We adding a switch view. Three out of seven partici- also see there is a tremendous opportunity to add pants wanted to generate a dynamic visualization of one page to Opus that allows users to combine data the direct comparison of two entities without having from different views to enable them to answer multi- to open two tabs for two entities and to compare ple questions on the same page by creating dynamic them manually. All of these ideas were very helpful visualizations. All of these suggestions can improve and thoughtful, and we to implement them in the user experience and enhance Opus' analysis ca- the future version of Opus. pabilities. While our platform can easily handle several thousand endpoints, we acknowledge that CONCLUSION scalability may be a potential concern. Each of these We have described the development process, design, limitations provides interesting future research op- instantiation, and evaluation of a platform, Opus, portunities. for innovatively visualizing publication data. The platform emphasizes that displaying such a dataset REFERENCES needs to be renovated and that it's necessary to 1. Sergio Alonso, Francisco Javier Cabrerizo, packet in such a clear manner. We believe that Opus Enrique Herrera-Viedma, and Francisco has significant practical relevance for researchers Herrera. 2009. h-Index: A review focused in its and academic managers since they lack the tools variants, computation and standardization for

9 different scientific fields. Journalof 13. Alekh Jindal and Samuel Madden. 2014. Informetrics 3, 4 (2009), 273-289. GRAPHiQL: A graph intuitive query language databases. In Big Data (Big Data), 2. Ricardo Baeza-Yates, Berthier Ribeiro-Neto, for relational InternationalConference on. IEEE, and others. 1999. Modern information retrieval. 2014 IEEE Vol. 463. ACM press New York. 441-450. 3. Rahul C Basole, Trustin Clear, Mengdie Hu, 14. Sep Kamvar and Jonathan Harris. 2009. Wefeel Harshit Mehrotra, and John Stasko. 2013. fine: An almanac of human emotion. Simon Understanding interfirm relationships in and Schuster. business ecosystems with interactive 15. Madian Khabsa and C Lee Giles. 2014. The visualization. IEEE Transactionson number of scholarly documents on the public visualization and 19, 12 web. PloS one 9, 5 (2014), e93949. (2013), 2526-2535. 16. Heidi Lam, Enrico Bertini, Petra Isenberg, 4. Scott Berinato. 2016. Good : The HBR , and . Guide to Making Smarter More Persuasive 2012. Empirical studies in information Data Visualizations. Harvard Business Review visualization: Seven scenarios. IEEE Press. transactionson visualization and computer 5. Lee Brasseur. 2005. Florence Nightingale's graphics 18, 9 (2012), 1520-1536. visual rhetoric in the rose . Technical 17. Dave Landry. 2017. React Components for Communication Quarterly 14, 2 (2005), d3plus Visualizations. (2017). 161-182. https://www.npmjs.com/package/d3plus-react 6. Declan Butler. 2008. Free journal-ranking tool 18. Henk F Moed, Judit Bar-Ilan, and Gali Halevi. enters citation market. (2008). 2016. A new methodology for comparing 7. Stuart K Card, Jock D Mackinlay, and Ben Google Scholar and Scopus. Journalof Shneiderman. 1999. Readings in information Informetrics 10, 2 (2016), 533-55 1. visualization: using vision to think. Morgan 19. Enrique Orduna-Malea, Juan M Ayll6n, Kaufmann. Alberto Martin-Martin, and Emilio Delgado 8. Sheelagh Carpendale. 2008. Evaluating L6pez-C6zar. 2015. Methods for estimating the information visualizations. In Information size of Google Scholar. Scientometrics 104, 3 visualization. Springer, 19-45. (2015), 931-949. 9. Fu-Sheng Chiu. 2007. Single page website 20. Steven Pinker. 1984. Visual cognition: An organization method. (June 7 2007). US Patent introduction. Cognition 18, 1-3 (1984), 1-63. App. 11/362,093. 21. Alisa Ono Sanjay Guruprasad and Almaha 10. Datawheel Deloitte and Cesar Hidalgo. 2016. Almalki. 2018. Native SVG visualizations for DataUSA. (2016). https://datausa.io React. (2018). http://replot.io 11. Instagram Facebook and community. 2013. 22. Thomson Scientific. 2017. Essential Science JavaScript Library for Building User Interfaces. Indicators Research Areas. (2017). (2013). https://reactjs.org http://ipscience-help.thomsonreuters.com/ 12. Matthew E Falagas, Eleni I Pitsouni, George A inCites2Live/83@@-TRS.htm Malietzis, and Georgios Pappas. 2008. 23. Edward Segel and Jeffrey Heer. 2010. Comparison of PubMed, Scopus, web of Narrative visualization: Telling stories with science, and Google scholar: strengths and data. IEEE transactionson visualization and weaknesses. The FASEB journal 22, 2 (2008), computer graphics 16, 6 (2010), 1139-1148. 338-342.

10 24. Ben Shneiderman. 2003. The eyes have it: A 29. Chad Vicknair, Michael Macias, Zhendong task by data type taxonomy for information Zhao, Xiaofei Nan, Yixin Chen, and Dawn visualizations. In The Craft of Information Wilkins. 2010. A comparison of a graph Visualization. Elsevier, 364-371. database and a relational database: a data 25. Alex Simoes. 2010. The Observatory of provenance perspective. In Proceedingsof the Economic Complexity. (2010). 48th annual Southeast regional conference. http://atlas.media.mit.edu/ ACM, 42. 26. Neo Technology. 2007. Graph Database 30. Fernanda B Vidgas, Danah Boyd, David H Jeffrey Potter, and Judith Donath. Management System. (2007). https://neo4j.com Nguyen, 2004. Digital artifacts for remembering and 27. New York Times. 2015. Mapping Segregation. storytelling: Posthistory and social network (2015). https://www.nytimes.com/interactive/2915/07/ fragments. In System Sciences, 2004. 08/us/census-race-map.html?_r-1 Proceedings of the 37th Annual Hawaii 28. Nees Jan van Eck and Ludo Waltman. 2014. InternationalConference on. IEEE, 10-pp. Visualizing bibliometric networks. In 31. Stefan Wuchty, Benjamin F Jones, and Brian Measuring scholarly impact. Springer, Uzzi. 2007. The increasing dominance of teams 285-320. in production of knowledge. Science 316, 5827 (2007), 1036-1039.

I I