<<

Innovation Information Initiative What comes after MAG?

Samuel J. Klein

Published on: May 27, 2021 License: Creative Commons Attribution 4.0 International License (CC-BY 4.0) Information Initiative What comes after MAG?

Edit this page

Overview of graphs and tools

This is an interlay for (scholarly) citation graphs: 1. What are these for / what is their scope; 2. What things like this exist now for various contexts; 3. How are these updated, by which curators + what processes; 4. What are the upstream and downstream sources + derivatives; and 5. What do we want the above to become, in the fullness of time?

Focus and challenge Compiling a global (or a subset relevant to your current context), in a format that’s convenient for [re]calculating metrics and training models.

What do we want this to become? Here are 5 things that everyone building an open academic-graph commons can contribute to, and a 0th thing (standards that can help align our efforts :)

Elements of the commons we want:

0. simple standards for being part of the commons open, forkable code + data, transparent processes commitment to register IDs, scripts, vocabularies, schemas, processes w/ a shared registry (WD or equivalent)

1. a federated data pipeline —> what can others build to speed this up? a source catalog + associated scripts a script library for processing/cleaning and disambiguation a federated event feed —> what exists, what more is needed? named processes for reproducing dataset outputs from the above

2. a vocabulary of core entities, and a set of PIDs others can build against for each (not an internal PK for each project; most projects don't need to generate a new PID for most entities)

3. a set of datasets released on a time series, w/ explicit + consistent (MAG used to provide one; whatever OR builds will be another; incremental updates are a bonus)

2 Innovation Information Initiative What comes after MAG?

4. a set of services available online, for free / at cost / at burden

5. internal documentation + interlayer description An overview: What is the future of the OAG? extending 'outside' reflections like this, w/ contributions from everyone providing part of the above A maintenance + dependency checklist: what upstreams + downstreams does the OAG depend on? How can someone rebuild it from scratch; or support its maintainers?

What exists now

Concordance of citation graphs

Other lists + aggregators Do other concordances exist? Lists of resources: (github-awesome lists) (wp list of graphs) List of academic : includes Internet Archive Scholar, fatcat

Citation graphs themselves / Open Academic Graph Internal graphs @ metrics-providers Web of Lens.org Depsy (deprecated): ( for software)

Search engines GettheResearch Derivatives: citation-intent, paper-ID, author-ID Dimensions

Metrics ImpactStory: https://profiles.impactstory.org/ (alt metrics) Clarivate Dimensions

3 Innovation Information Initiative What comes after MAG?

How are these updated?

Most internal/commercial pipelines are opaque. Dimensions updates some things continuously, other things (GRID) twice a year.

Topic maps —> Citation existence —> Dissambiguating article + author ID —> Citation affect Crossref

Drafts specs: Event feed —> what is needed? Data pipeline —> what can others build to speed this up? ID set —> (OurR spec) —> coming out soon :) mainly want people to actually be open!

Process writeup: What comes after MAG?

Data sources: Limiting what else

Open requests: How do people currently use the MAG API ? What's missing so far? (conf proceedings, non-DOIs, open list for requesters, ML classification)

IDs —> What new ones exist? what’s being maintained? : MAG ID —>

Attendees —> : IDs — GRID / ROR / SS / IA [new primary key] OAIR : [SS / Meta / BN ? / Crossref / MAG / Lens] —> clarify degree of open code + data —> publisher agreements

API access

: read-only GETs (as per MAG?)

4 Innovation Information Initiative What comes after MAG?

===

Patent feeds as well COAR/BASE compared to UPW

What are related up + downstreams?

170 dataset-papers drawing on MAG Reliance on Science

Where do we want to be? + related research

What comes after MAG? Microsoft Academic Graph changed the landscape of possibility for uses of citation graphs. It was mostly-complete and mostly-free to reuse, at launch 7 years ago. It was updated by a talented team at MS, which did extensive document- processing on a wide range of source formats. It quickly became a staple of any aggregator of such data, and people started to rely on its identifiers, author-identification, and topic-mapping

Related research “Zenodo in the Spotlight of Traditional and New Metrics”

5