<<

Some practical difficulties in creating field- normalized altmetrics

“There is only one thing in life worse than being talked about, and that is not being talked about.” Oscar Wilde

Michael Taylor

Work smart. Discover more. Introduction

• Researching subject-based variations in different forms of altmetrics for PhD

• One practical outcome is to underpin approaches to creating subject-normalized altmetrics

• This is particularly important as altmetrics are being increasingly examined for research evaluation: users are expressing an interest in analysis

• Furthermore, two distinct use-cases: • to read “off the page” when an article is “highly shared”; • to perform more nuanced evaluation work, typically with a portfolio

• Citation has a long history of subject normalized metrics, eg, FWCI, NCI, and ‘top-slicing’ – ie, analysing the top 10% for citation

• It is tempting to simply follow these approaches, but these metrics make assumptions about the data that don’t necessarily follow

Work smart. Discover more. Data set • 9610 articles drawn from the NIH’s publication database • Classified into 13 subject areas on an article-by-article basis using machine learning (not journal classification) • Subject areas > 200 articles are analysed: • Biochemistry and Cell ; Cardiorespiratory Medicine and Haematology; Clinical Sciences; ; ; Medical ; ; Oncology and Carcinogenesis; Other Physical Sciences; Paediatrics and Reproductive Medicine; Physical Chemistry; Psychology; Public Health and Health Services

• Citations from NIH’s icite tool • data direct from Mendeley’s API • Altmetric data from Altmetric.com’s API

• Costa et al “Exploring paths for the normalization of altmetrics: applying the Characteristic Scores and Scales” http://altmetrics.org/wp-content/uploads/2016/09/altmetrics16_paper_6.pdf • Taylor “Engineers Don’t Blog” – poster at 2AM - https://figshare.com/articles/Engineers_Don_t_Blog_and_Other_Stories_why_Scopus_uses_subject_a rea_benchmarking_/1568135

Work smart. Discover more. Methodology

• Four data points considered:

• Citations • Mendeley readers & Citeulike saves • Altmetric Attention Score • Social Network activity (Twitter, Facebook, Reddit, Google+)

• Articles with zero values in each four data are placed in a distinct category

• Articles with non-zero values are placed into deciles (NB collisions are an issue, eg, Social Network count of 1 straddles more than one decile

• Examining the number of documents in each classification / decile, and the mean values within that decile • How does ‘zero’ compare to a decile? • Are deciles reasonably evenly distributed?

Work smart. Discover more. # articles in deciles - citation

Distribution of Documents in Citation Deciles 35.00%

30.00% Biochemistry and Genetics 25.00% Physical Chemistry Cardiorespiratory Medicine and Haematology 20.00% 20% Psychology

% Public Health and Health Services 15.00% Paediatrics and Reproductive Medicine Oncology and Carcinogenesis 10.00% Neurosciences Medical Microbiology

5.00% Immunology Clinical Sciences

0.00% 0 10 20 30 40 50 60 70 80 90 100 Decile

Work smart. Discover more. # articles in deciles – Mendeley & Citeulike

Distribution of Documents in Mendeley and Citeulike Deciles 16.00%

14.00% Biochemistry and Cell Biology 12.00% Genetics Physical Chemistry

10.00% Cardiorespiratory Medicine and Haematology 10% Psychology 8.00% Public Health and Health Services

Documents (%) Paediatrics and Reproductive Medicine 6.00% Oncology and Carcinogenesis Neurosciences 4.00% Medical Microbiology Immunology 2.00% Clinical Sciences

0.00% 0 10 20 30 40 50 60 70 80 90 100 Decile

Work smart. Discover more. # articles in deciles – Altmetric Attention Score

Distribution of Documents in Altmetric Attention Score Deciles 90.00%

80.00%

70.00% Biochemistry and Cell Biology 30% Genetics 60.00% Physical Chemistry

Cardiorespiratory Medicine and Haematology 50.00% Psychology Public Health and Health Services 40.00%

Documents (%) Paediatrics and Reproductive Medicine Oncology and Carcinogenesis 30.00% Neurosciences 20.00% Medical Microbiology Immunology 10.00% Clinical Sciences

0.00% 0 10 20 30 40 50 60 70 80 90 100 Decile

Work smart. Discover more. # articles in deciles – Social Network Activity

Distribution of Documents in Social Network Deciles 90.00%

80.00%

Biochemistry and Cell Biology 70.00% 35% Genetics

60.00% Physical Chemistry Cardiorespiratory Medicine and Haematology 50.00% Psychology Public Health and Health Services 40.00% Paediatrics and Reproductive Medicine Oncology and Carcinogenesis

30.00% Neurosciences Medical Microbiology 20.00% Immunology Documents (%) Clinical Sciences 10.00%

0.00% 0 10 20 30 40 50 60 70 80 90 100 Decile

Work smart. Discover more. Mean values in deciles - citation

Average Citation Count Per Decile Order change might stabilize with 160 geometric mean

140

Biochemistry and Cell Biology 120 Genetics Physical Chemistry 100 c100 Cardiorespiratory Medicine and Haematology citations Psychology 80 Public Health and Health Services

Citation count Citation Paediatrics and Reproductive Medicine 60 Oncology and Carcinogenesis Neurosciences 40 c20 Medical Microbiology citations Immunology 20 Clinical Sciences

0 0 10 20 30 40 50 60 70 80 90 100 Decile

Work smart. Discover more. Mean values in deciles – Mendeley & Citeulike Average Mendeley and Citeulike Save Per Decile 200

180

160 Biochemistry and Cell Biology Genetics 140 Physical Chemistry 120 Cardiorespiratory Medicine and Haematology Psychology 100 Public Health and Health Services Paediatrics and Reproductive Medicine 80 Oncology and Carcinogenesis Save and Citeulike Mendeley 60 Neurosciences Medical Microbiology 40 Immunology

20 Clinical Sciences

0 0 10 20 30 40 50 60 70 80 90 100 Decile

Work smart. Discover more. Mean values in deciles – Altmetric Attention Score Average Altmetric Attention Score Per Decile 120

100 Biochemistry and Cell Biology Genetics

80 Physical Chemistry Cardiorespiratory Medicine and Haematology Psychology 60 Public Health and Health Services Paediatrics and Reproductive Medicine Oncology and Carcinogenesis Score Attention Altmetric 40 Neurosciences Medical Microbiology 20 Immunology Clinical Sciences

0 0 10 20 30 40 50 60 70 80 90 100 Decile

Work smart. Discover more. Mean values in deciles – Social Network

Average Social Network Share Per Decile 70

60 Biochemistry and Cell Biology Genetics 50 Physical Chemistry Cardiorespiratory Medicine and Haematology 40 Psychology Public Health and Health Services 30 Paediatrics and Reproductive Medicine

Social Network Share Network Social Oncology and Carcinogenesis 20 Neurosciences Medical Microbiology

10 Immunology Clinical Sciences

0 0 10 20 30 40 50 60 70 80 90 100 Decile

Work smart. Discover more. 90.00% Distribution of Documents in Social Observations using Social Network Deciles 80.00% Network data 70.00% To some extent, these Average Social Network Share Per Decile 60.00% observations can be made 70 50.00% to all four data source, and for all disciplines 40.00% 60 Only limited similiarity

30.00% between high log 20.00% growth and low zero % 50 The order of the disciplines 10.00% in the upper decile is 0.00%Documents (%) different to mid-range 0 40 10 20 30 40 Decile50 60 70 80 90 100 deciles Is there equivalence 30 between a mean score of 5, and a mean score of 65? Upper-range trend Social Network Share Network Social approximates to log 20 Mid-range trend growth (y = x^m2) approximates to arithmetic growth (y = 10 m1x) Difference between max and min in upper 0 decile is multiple of 0 10 20 30 40 50 60 70 80 90 100 mid-range Decile

Work smart. Discover more. Research questions • Does each data source and each discipline exhibit three Zero, Arithmetic, Log (ZAL) features? L

• Can we identify the range where log growth exceeds A arithmetic? Z • (Look at linear and log divisions, regression analysis, confidence)

• Are growth states independent? Does an article with log growth require an arithmetic stage? Is it an error to see arithmetic and log growth as being the same type of activity?

• Can we reliably quantify m1 and m2 for the different disciplines? (And probability P for zero-values)

• Should platforms allow for analysis using ZAL values?

• For different data sources, the balance between ZAL importance will shift: eg, Z is less important for Mendeley, L will rarely be an issue for policy documents. How can we support analysts to use the most relevant metric?

• Does it matter if very few chemistry papers are ‘highly shared’, compared to psychology, compared to a baseline of 10%

Work smart. Discover more.