Volume 7 Issue 3

©2019 DataONE 1312 Basehart SE University of New Mexico Albuquerque NM 87106 Small Teams vs. Large Teams: Who Wins? Knowledge production by scientists is increasingly a subject of scientific research thanks to easy access to tens of millions of publications, millions of patents, and citation metrics. Much of this research has attempted to unravel the importance of teams to knowledge production and identify optimal sizes and composition of teams. In this article, I summarize a tiny subset of the key findings from three high-profile papers in this area over the past dozen years and relate these findings to the past decade of work conducted by the to insert novel combinations into familiar radical, transformative solutions. Large teams, DataONE team. knowledge domains.” on the other hand, tend towards conservatism (probably as a result of democratic choices In 2007, Wuchty and colleagues Based on these two papers (and many being made by a large group; i.e., the moderate documented the increase in team science that reach similar conclusions), teams are ideas win out over the more radical, risky ideas) across almost every research field and reported clearly needed to tackle today’s complex, and are necessary to build infrastructure that teams normally produce more frequently interdisciplinary problems, and are doing so (e.g., telescopes, colliders) and implement cited research, including exceptionally high- with increasing frequency and deeper impact. large-scale tests of hypotheses (e.g., Long- impact research, than individuals. Their results But, questions remain. Is there an optimal Term Ecological Research, LTER; National were based on analysis of almost 20 million team size? Are large teams more effective Ecological Observatory Network, NEON)— papers and more than 2 million patents. They than small teams? And, so on. many of which emerged from small teams and postulated “the process of knowledge creation Wu and colleagues (2019) at the University individuals. has fundamentally changed.” of Chicago addressed some of these questions How does this apply to cyberinfrastructure by developing and validating a “disruptiveness (CI), in general, and DataONE, more In 2013, Uzzi and colleagues examined index,” and analyzing 65 million plus papers, specifically? DataONE is inarguably and 17.9 million papers spanning all scientific patents and products over six decades necessarily a large team effort involving many fields across five decades of the Web of (1954–2014). They found that “smaller teams dozens of individuals to design, build, user- Science, concluding “that science follows a have tended to disrupt science and technology test, and disseminate the CI solutions, as well nearly universal pattern: The highest-impact with new ideas and opportunities, whereas as perform education and outreach to the science is primarily grounded in exceptionally larger teams have tended to develop existing community. Nevertheless, the initial ideas conventional combinations of prior work ones.” The findings held across disciplines that seeded DataONE largely originated from yet simultaneously features an intrusion of and applied equally to papers, patents and a very small team of initial informatics and unusual combinations. Papers of this type software products. computer science experts. Subsequent and were twice as likely to be highly cited works. current funded proposals also have all been Novel combinations of prior work are rare, yet Results of the three science-of-team- based on the more “radical,” but fundable, teams are 37.7% more likely than solo authors science research efforts summarized above ideas generated in small proposal teams. Yet, make for an interesting story. First, science the actual infrastructure building has required has indeed become more of a team sport a much larger implementation team that has SAVE THE DATE and this can be attributed to the complexity of built and made incremental improvements to problems being addressed, coupled with the the core CI over time. The DataONE Users Group meeting degree of specialization achieved in individual Monday July 15th in Tacoma, WA fields and the relative ease with which teams In essence, both small and large research can be formed and work together using the and CI teams are necessary for the progression Co-located with the latest communication and collaboration tools. of science and technology. Individual and Summer ESIP meeting Second, small teams (in particular) benefit small team research efforts are critical for Registration will open soon from the free flow of diverse opinions and generating radical new ideas and designs ideas, which may be more likely to lead to new, cont’d on page 4 ››› Spring 2019

MemberNodeDESCRIPTIONDESCRIPTION

OutreachUPDATE In each newsletter issue we will highlight one of our current Member Nodes. The full list of Member Nodes and summary metrics can be found on the DataONE.org site at bit.ly/D1CMNs. Spring is an exciting, albeit busy, time of year for DataONE Community Engagement and Outreach as we prepare for our summer UIC University Library activities: The DataONE Summer Internship https://library.uic.edu Program, Users Group meeting and A unique member node partnership conference season is fast upon us. The University of Illinois at Chicago (UIC) is located adjacent to downtown In the new year, the DataONE team Chicago. UIC is a diverse, research intensive urban public university. It is identified six projects that will support just over 50 years old (opened in 1965), has more than 31,000 students, DataONE move forward into our new phase and just under 2,000 full-time faculty. UIC is comprised of 15 academic of growth and activity. With opportunities colleges and has campuses in Chicago, Peoria, Rockford, and Urbana including education development, outreach and is classified as “a doctoral institution” with the “highest research and advocacy, enhancement of current activity.” While UIC ranks 58th among U.S. institutions in federally-funded resources, the projects will enhance exposure research, they do not have an existing collection of datasets in a repository. of DataONE and build capacity for increased So why is an academic research library without a large collection of community participation. More importantly, data running a DataONE Member Node? Simply, it fits their values: to they will provide valuable, real project-based create, preserve, and provide access to knowledge and share expertise. work experience for interns that are pursuing Academic research libraries commit to the stewardship and curation careers or skill development in data science. of cultural heritage data and scholarship from all disciplines and formats. The internships are remote and run for a Traditional formats include journals, monographs, special collections, and 9-week period over the summer. Open to university archives. But today, digital scholarship and datasets are part of library undergraduate students, graduate students, collection development activity. Additionally, research libraries often come and postgraduates who have received their together to support community repositories and projects (e.g., LOCKSS, the Public degree within the past five years, interns Knowledge Project, DuraSpace) for digital preservation and open access, typically must be currently enrolled or employed at a contributing dollars or personnel hours to keep these valuable services going. U.S. university or other research institution in The UIC University Library Member Node came online in 2017 as a DataONE addition to currently residing in, and eligible replication target. This Tier 4 Member Node uses Metacat and provides space to work in, the United States. Details on to store replicas of datasets from other Member Nodes that choose to replicate the six project opportunities can be found their content. UIC currently hosts more than 7,800 data replicas across the at dataone.org/internships and are linked Member Nodes to ensure that science data remains persistently available. below. There is still opportunity to apply with Joining the DataONE Federation supports the UIC Library’s objective to a current deadline of March 25th. serve science and the public and does not cause undue burden. Deploying the Member Node required a server with 2 CPUs and 8GB RAM, and uses CentOS Project 1. Tools to enhance community driven 7 operating system. At about US$2,600 and 20 hours of system administration data management education per year, operating a DataONE Member Node costs less than some of their Project 2. for Self or Others? A other community-based preservation and repository service endeavors. Study with Hands-on Experiments The UIC Member Node adds resiliency and capacity to the federation and Project 3. Supporting Community Outreach supports the long-term data management objectives of other member and Advocacy for Open Data repositories by ensuring access to data in the event of short-term outages. Project 4. Reach and Citation of DataONE Project 5. Build capacity for using DataONE via Python meeting. More information can be found in the “Data Help Desk” at the Ecological Society Project 6. A Reproducible Network Analysis DUGOut (page 3) from group chairs Robert of America meeting in Louisville, KY this of the DataONE Linked Open Data Sandusky and Karl Benedict and registration August. Following the success of shared graph will soon open via https://www.dataone.org/ exhibits at ESA and AGU in 2018 we are dataone-users-group/2019-meeting. excited to continue to support researchers in In July, DataONE will again be co- data solutions and discovery as part of this locating with the Federation of Earth Science We are also partnering with ESIP, in collaborative activity. n Information Partners (ESIP) summer meeting addition to iDigBio, EDI, the Arctic Data for our annual DataONE Users Group Center and GBIF, to host a multi-booth

2 Spring 2019

Team and the DataONE Transition Team to help us draft the white paper, the mission/ thus ensuring that the perspectives of key vision/values statements, or to program the institutions and focal areas are represented 2019 DUG meeting, contact us at dugchairs@ TheDUGout in the white paper, and communicated to the dataone.org. We would love to work with you! DataONE program leadership. How will the voices of all DataONE Watch for communications from us in the stakeholders – repository managers, Our current work on the white paper is coming months to give you opportunities to researchers, data managers, institutions, informed by our participation in the DataONE influence the future direction of DataONE. n funders, librarians – be marshalled and External Advisory Board meeting, the amplified to contribute to decisions about interactive session we led at the Coalition Register for the Summer meeting: DataONE’s transition from a funded project to for Networked Information fall membership https://www.dataone.org/dataone-users- a long-term program? How can opportunities meeting, multiple presentations at the group/2019-meeting to expand and strengthen the community American Geophysical Union meeting, Join the DUG! be identified, discussed, understood, and and interviews with past and present DUG https://www.dataone.org/sign-up prioritized? How can the DataONE Community steering committee members. Listening to the characterize threats and weaknesses and community was our goal at all three meetings develop strategies and tactics to address and the interviews, and we received a lot of them in a rapidly evolving data management feedback that we are taking on board this landscape? spring.

The DataONE Users Group (DUG) Steering The format of the 2019 DUG meeting will Committee is working on a white paper ensure sufficient time for discussion and addressing these questions for discussion voting by the members present on a specific and action at the DUG membership meeting set of structures and processes intended to DUG Co-Chairs Robert Sandusky (left) and Karl Benedict (right) in Tacoma, Washington in July, 2019. Drafts integrate the interests of all stakeholders will be created and shared with the entire DUG in the evolving DataONE Community and — Robert Sandusky membership throughout the spring and into the program. We also plan to discuss and vote on Co-Chair, DataONE Users Group; University of summer that will present recommendations revised mission, vision, and values statements Illinois, Chicago for organizing the DataONE community for for the DataONE Community. — Karl Benedict the next several years. The DUG co-chairs are Co-Chair, DataONE Users Group; University of New now members of the DataONE Leadership If you are interested in volunteering now Mexico Communication, Planning & Finances: Learn Tools to Make Your Project Successful June 2019 SBI Training: Co-Located with the 3rd Annual Digital Data in Biodiversity Research Conference Strategies for Success: Training for Project Directors, June 12-14: New Haven, CT

Spend 3 days with our expert instructors where you will: • Create a plan to make your project more financially sustainable and successful; • Learn how to secure funding from private foundations; • Hone your skills in strategic planning, financed, and communication; and, • Network with colleagues who face similar challenges.

The ESA SBI is excited to co-locate their Spring course with the 3rd Annual Digital Data in Biodiversity Research Conference, “Methods, Protocols, and Analytical Tools for Specimen-based Research in the Biological Sciences”. The conference begins on June 10th, and registration is only $50 if you are also attending the SBI course. Register early- space is limited! For more information about the course visit https://esa.org/sbi/

3rd Annual Digital Data in Biodiversity Research Conference information: https://www.idigbio.org/content/save-date- methods-protocols-and-analytical-tools-specimen-based-research-biological-sciences

Meeting Registration: https://www.eventbrite.com/e/3rd-annual-digital-data-conference-methods-protocols-and- analytical-tools-for-specimen-based-tickets-54760252389

3 Spring 2019

UPCOMING WEBINAR UpcomingEVENTS Advancing Research Members of the DataONE Team will be at the following events. Data Publishing: Full information on training activities can be found at bit.ly/D1Training. Dryad’s Next Steps Mar. 31 - Apr. 1 Tuesday, April 9, 2019 Drexel-CODATA FAIR-RRDM Workshop 9 am Pacific / 12 noon Eastern Philadelphia, Pennsylvania http://www.codata.org/events/conferences/drexel-metadata-research-centre-and-codata- Information and registration at: workshop https://www.dataone.org/ Apr. 2-4 upcoming-webinar Research Data Alliance - 13th Plenary Philadelphia, Pennsylvania ››› cont’d from page 1 https://www.rd-alliance.org/plenaries (pushing the state of knowledge to new levels), Apr. 8-9 but typically do not have the capacity to build CNI Spring Meeting large infrastructure or test ideas across broad St. Louis, MO scales of space and time (e.g., LTER, NEON). https://www.cni.org/events/mm/spring-2019 On the other hand, large teams are essential to Jun. 10-12 incrementally building massive infrastructure Digital Data in Biodiversity (iDigBio) and performing long-term, regional to global New Haven, CT experiments, but may be at a significant https://www.idigbio.org/ disadvantage in debating and generating radical new ideas and designs. Jul. 15 DataONE Users Group Meeting DataONE has successfully employed both Tacoma, WA https://www.dataone.org/dataone-users-group/2019-meeting small and large teams to achieve its successes. Furthermore, our initial adoption of the Jul. 16-19 working group model based on the successful Earth Sciences Information Partners National Center for Ecological Analysis Tacoma, WA and Synthesis has enabled us to use both https://2019esipsummermeeting.sched.com small working groups (or subsets thereof) to generate new ideas and designs, as well as the entire team to collaboratively build and refine the infrastructure. We have also worked hard to inculcate diversity of ideas and opinions into our team by selecting members with different DATAONE SUMMER INTERNSHIP PROGRAM gender/race/ethnicity and from different The DataONE Summer Internship program runs May though July and provides an opportunity for institutions, ages, and academic status. Idea students and postgraduates to work with DataONE on a nine-week project. We are pleased to offer diversity is then balanced by a governance support for four great opportunities. Available projects include: structure that ensures that decisions among • Tools To Enhance Community Driven Data Management Education competing ideas will be made in an orderly • Provenance For Self Or Others? A Study With Hands-On Experiments fashion, so that infrastructure building and • Supporting Community Outreach And Advocacy For Open Data • Reach And Citation Of DataONE refinement can proceed. In short, done right • Build Capacity For Using DataONE Via Python and with sufficient resources, both small and • A Reproducible Network Analysis Of The DataONE Linked Open Data Graph n large teams can win. For details on each of these projects, information on eligibility, stipend and how to apply visit: — William Michener www.dataone.org/internships Principal Investigator, DataONE Apply by March 25th 2019 [1] Wuchty, S., B.F. Jones, B. Uzzi. 2007. The increasing dominance of teams in production of knowledge. Science 316:1036-1039. DOI: 10.1126/science.1136099 [2] Uzzi, B., S. Mukherjee, M. Stringer, B. Jones. 2013. Atypical combinations and scientific impact. Science 342:468-472. DOI: 10.1126/science.1240474 [3] Wu, L., D. Wang, J.A. Evans. 2019. Large teams develop and small teams disrupt science and technology. Nature 566:378-382. DOI: 10.1038/s41586-019-0941-9

4 Spring 2019

FeaturedRESOURCE By the Numbers: DataONE Metrics Are Now Live http://dataone.org/numbers In order to highlight how DataONE improves data access, provides training and education, and builds community, we developed and launched DataONE’s new By the Numbers metrics page. Our data discovery and education metrics are now live and accessible on our website. You can see how much data is available through DataONE, growth in amount of data accessible since 2012, and common searches of these data. Interested in more information on where data are coming from? On our By the Numbers page you can find out how many repositories are members within the DataONE network and search a global map to see where in the world they are located. New member repositories are regularly joining and can be found under Newest Member Repositories. Interested in participating? Our DataONE Users Group has around 600 members, representing research scientists, data managers, librarians, and a host of other job titles, primarily from academic positions. These metrics and more demonstrate the communities’ continuing interest in advancing data management as led by DataONE. WorkingGroupFOCUS 1312 Basehart SE Usability in DataONE: ’s User-eXperience Lab University of New Mexico Albuquerque, NM 87106 A unique aspect of DataONE is how usability has been closely integrated into the design Fax: 505.246.6007 process through work with the University of Tennessee’s User eXperience Lab (UXL). In 2018, UXL focused on the DataONE Search interface (search.dataone.org), including how DataONE DataONE is a collaboration partners, Whole Tale and Make Data Count, have been incorporated into the interface. UXL among many partner organizations, and is funded by the US National results provide DataONE with a more robust interface with expanded capabilities that improves Science Foundation (NSF) under access for its users, and continues to offer integrated access to data from its stakeholders. a Cooperative Agreement. Last month, the UXL conducted a usability study of Make Data Count and its’ data citation and usage metrics exposed through the DataONE discovery interface. The study looked at how users’ Project Director: interpret the metrics, interact with the features, and how these capabilities are incorporated William Michener into DataONE Search. Improvements will be made based on the study and will aid in the overall [email protected] usability of how the metrics are displayed and users’ understanding of the information. The UXL 505.814.7601 plans to conduct further usability testing as more metrics are added. Executive Director: The UXL also worked with the Whole Tale project to improve its interface. Whole Tale is a web- based platform for creating “tales”, a new type of preservable research object that combines Rebecca Koskela data, software, and narrative into a single re-runnable package. Through the Whole Tale interface [email protected] 505.382.0890 users can publish their tales to supporting DataONE member nodes. The UXL completed two UX studies early in the design process, first with PDFs and then with interactive mock-ups. Early Director of Community UX testing allows Whole Tale to identify UX issues before spending significant resources on the Engagement and Outreach: user interface. Amber Budden The User-eXperience Laboratory provides the capability to conduct in-person and remote UX [email protected] studies allowing DataONE and its partners to reach its customers and stakeholders to ensure its 505.205.7675 products and tools are easily assessable, usable, and effective for all users. UXL will continue its usability work with DataONE and its partners in 2019. Director of Development and Operations Learn more about these projects by visiting their websites: https://makedatacount.org/ and Dave Vieglais https://wholetale.org/, and learn more about the User-eXperience Lab at https://cics.cci.utk.edu/ [email protected] user-experience-lab n

5 Spring 2019 ByTheNUMBERS Live online at: www.dataone.org/numbers DATA DISCOVERABLE THROUGH DATAONE

Activity this Month for Member Repositories 53 TB 816 K 1.18 M of content metadata data 42K Uploads 305K Downloads Data 2,446 Visitors to our search page* Metadata 1,172 Searches conducted* *metrics are running monthly averages Files Uploaded Top Search Terms (Previous Month) 1. Carbon 2. South Coast SOURCE: CN.DATAONE.ORG 3. Water Only the first version of each file is counted

OUR COMMUNITY Repositories in the DataONE Federated Network

45 Participating Repositories New Members this Quarter CARY INSTITUTE OF ECOSYSTEM STUDIES

INTERDISCIPLINARY EARTH DATA ALLIANCE (IEDA)

596 DataONE User Group members

LEARNING IMPACTS Most Downloaded Resources Last Month Users trained 5300+ 1 2 3 Data Best Data Webinar Series Managment Practices Management Plan Primer Plan Average number of attendees Example for Example from 93 NSF Manua Loa 1,849 Unique webinar attendees Most Visited Pages Last Month Education Resources 1 Internships 19,456 Visits to the public webpage* 2 Data Life Cycle 216 Education Module downloads* 3 Best Practices: Create and document data backup policy *metrics are runniing monthly averages

6 Metrics inclusive of February 2019