Innovation Information Initiative What comes after MAG?
Samuel J. Klein
Published on: May 27, 2021 License: Creative Commons Attribution 4.0 International License (CC-BY 4.0) Innovation Information Initiative What comes after MAG?
Edit this page
Overview of citation graphs and tools
This is an interlay for (scholarly) citation graphs: 1. What are these for / what is their scope; 2. What things like this exist now for various contexts; 3. How are these updated, by which curators + what processes; 4. What are the upstream and downstream sources + derivatives; and 5. What do we want the above to become, in the fullness of time?
Focus and challenge Compiling a global citation graph (or a subset relevant to your current research context), in a format that’s convenient for [re]calculating metrics and training models.
What do we want this to become? Here are 5 things that everyone building an open academic-graph commons can contribute to, and a 0th thing (standards that can help align our efforts :)
Elements of the commons we want:
0. simple standards for being part of the commons open, forkable code + data, transparent processes commitment to register IDs, scripts, vocabularies, schemas, processes w/ a shared registry (WD or equivalent)
1. a federated data pipeline —> what can others build to speed this up? a source catalog + associated scripts a script library for processing/cleaning and disambiguation a federated event feed —> what exists, what more is needed? named processes for reproducing dataset outputs from the above
2. a vocabulary of core entities, and a set of PIDs others can build against for each (not an internal PK for each project; most projects don't need to generate a new PID for most entities)
3. a set of datasets released on a time series, w/ explicit + consistent (MAG used to provide one; whatever OR builds will be another; incremental updates are a bonus)
2 Innovation Information Initiative What comes after MAG?
4. a set of services available online, for free / at cost / at burden
5. internal documentation + interlayer description An overview: What is the future of the OAG? extending 'outside' reflections like this, w/ contributions from everyone providing part of the above A maintenance + dependency checklist: what upstreams + downstreams does the OAG depend on? How can someone rebuild it from scratch; or support its maintainers?
What exists now
Concordance of citation graphs
Other lists + aggregators Do other concordances exist? Lists of resources: (github-awesome lists) (wp list of graphs) List of academic databases: includes Internet Archive Scholar, fatcat
Citation graphs themselves Microsoft Academic / Open Academic Graph Internal graphs @ metrics-providers Web of Science Lens.org Publish or Perish Depsy (deprecated): (citations for software)
Search engines GettheResearch Semantic Scholar Derivatives: citation-intent, paper-ID, author-ID Dimensions
Metrics ImpactStory: https://profiles.impactstory.org/ (alt metrics) Clarivate Dimensions
3 Innovation Information Initiative What comes after MAG?
How are these updated?
Most internal/commercial pipelines are opaque. Dimensions updates some things continuously, other things (GRID) twice a year.
Topic maps —> Citation existence —> Dissambiguating article + author ID —> Citation affect Crossref
Drafts specs: Event feed —> what is needed? Data pipeline —> what can others build to speed this up? ID set —> (OurR spec) —> coming out soon :) mainly want people to actually be open!
Process writeup: What comes after MAG?
Data sources: Limiting what else
Open requests: How do people currently use the MAG API ? What's missing so far? (conf proceedings, non-DOIs, open list for requesters, ML classification)
IDs —> What new ones exist? what’s being maintained? : MAG ID —>
Attendees —> : IDs — GRID / ROR / SS / IA [new primary key] OAIR : [SS / Meta / BN ? / Crossref / MAG / Lens] —> clarify degree of open code + data —> publisher agreements
API access
: read-only GETs (as per MAG?)
4 Innovation Information Initiative What comes after MAG?
===
Patent feeds as well COAR/BASE compared to UPW
What are related up + downstreams?
170 dataset-papers drawing on MAG Reliance on Science
Where do we want to be? + related research
What comes after MAG? Microsoft Academic Graph changed the landscape of possibility for uses of citation graphs. It was mostly-complete and mostly-free to reuse, at launch 7 years ago. It was updated by a talented team at MS, which did extensive document- processing on a wide range of source formats. It quickly became a staple of any aggregator of such data, and people started to rely on its identifiers, author-identification, and topic-mapping
Related research “Zenodo in the Spotlight of Traditional and New Metrics”
5