<<

Volume 54 Number 1 February 2020 ISSN 0023-6772

Laboratory Animals THE INTERNATIONAL JOURNAL OF LABORATORY ANIMAL SCIENCE, MEDICINE, TECHNOLOGY AND WELFARE

Official Journal of AFSTAL, ECLAM, ESLAV, FELASA, GV-SOLAS, ILAF, LASA, NVP, SECAL, SGV, SPCAL

Special Issue: Severity Assessment in animal based research

Published on behalf of Laboratory Animals Ltd. journals.sagepub.com/home/lan by SAGE Publications Ltd.

Volume 54 Number 1 February 2020 Contents

Severity Assessment in animal based research Editorial Severity Assessment in animal based research 16 A Bleich, M Bankstahl, P Jirkof, J-B Prins and RH Tolba

Special Issue Articles Nest-building performance in rats: impact of vendor, experience, and sex 17 K Schwabe, L Boldt, A Bleich, RM van Dijk, SOA Helgers, C Ha¨ger, M Nowakowska, A-K Riedesel, K Scho¨nhoff, B Struve, J Wittek and H Potschka

Measurement of corticosterone in mice: a protocol for a mapping review 26 CHC Leenaars, S van der Mierden, M Durst, VC Goerlich-Jansson, FL Ripoli, LM Keubler, SR Talbot, E Boyle, A Habedank, P Jirkof, L Lewejohann, P Gass, R Tolba and A Bleich

Design of a joint research data platform: A use case for severity assessment 33 SR Talbot, S Bruch, F Kießling, M Marschollek, B Jandric, RH Tolba and A Bleich

Systematic analysis of severity in a widely used cognitive depression model for mice 40 AS Mallien, C Ha¨ger, R Palme, SR Talbot, MA Vogt, N Pfeiffer, C Brandwein, B Struve, D Inta, S Chourbaji, R Hellweg, B Vollmayr, A Bleich and P Gass

Where are we heading? Challenges in evidence-based severity assessment 50 LM Keubler, N Hoppe, H Potschka, SR Talbot, B Vollmar, D Zechner, C Ha¨ger and A Bleich

Wheel running behaviour in group-housed female mice indicates disturbed wellbeing due to DSS colitis 63 N Weegh, J Fu¨ner, O Janke, Y Winter, C Jung, B Struve, L Wassermann, L Lewejohann, A Bleich and C Ha¨ger

A safe bet? Inter-laboratory variability in behaviour-based severity assessment 73 P Jirkof, A Abdelrahman, A Bleich, M Durst, L Keubler, H Potschka, B Struve, SR Talbot, B Vollmar, D Zechner and CHa¨ger

Improvement of the Mouse Grimace Scale set-up for implementing a semi-automated Mouse Grimace Scale scoring (Part 1) 83 L Ernst, M Kopaczka, M Schulz, SR Talbot, L Zieglowski, M Meyer, S Bruch, D Merhof and RH Tolba

Semi-automated generation of pictures for the Mouse Grimace Scale: A multi-laboratory analysis (Part 2) 92 L Ernst, M Kopaczka, M Schulz, SR Talbot, B Struve, C Ha¨ger, A Bleich, M Durst, P Jirkof, M Arras, RM van Dijk, N Miljanovic, H Potschka, D Merhof and RH Tolba

Defining body-weight reduction as a humane endpoint: a critical appraisal 99 SR Talbot, S Biernot, A Bleich, RM van Dijk, L Ernst, C Ha¨ger, SOA Helgers, B Koegel, I Koska, A Kuhla, N Miljanovic, F-T Mu¨ller-Graff, K Schwabe, R Tolba, B Vollmar, N Weegh, T Wo¨lk, F Wolf, A Wree, L Zieglowski, H Potschka and D Zechner News Life cycle of a FELASA working group 113 J-P Mocho

Cartilla dida´ctica ‘‘Mi experiencia con la ciencia’’: Una iniciativa de la SECAL para acercar a los nin˜os a la experimentacio´n cientı´fica 115 H Serna Each of our customers is different

For each customer’s requirement we have a solution

Your experienced Swiss partner for all laboratory animal diets

Quality and service − Successful together

KLIBA NAFAG T. +41 61 816 16 16 Rinaustrasse 380 www.kliba-nafag.ch CH-4303 Kaiseraugst [email protected] Catalogue 2020 – now available! Request your free copy at: fi nescience.de

FINE SURGICAL INSTRUMENTS FOR RESEARCH™ ...... 5 Laboratory Animals Subscription information Annual subscription (2020) including postage: AALAS Member, (print and electronic) [£254/US$470]. Combined Institutional Rate (print and electronic) [£339/US$662]. Electronic only and print Editorial Board only subscriptions are available for institutions at a discounted Editor-in-Chief B Riederer rate. Note VAT is applicable at the appropriate local rate. Deputy Editors G Jarvis, P Jirkof Visit sagepublishing.com for more subscription details. To activate your subscription (institutions only) visit Section Section Editors online.sagepub.com. Abstracts, tables of contents and contents alerts are available on this site free of charge for all. Anaesthesia, Analgesia, M Leach, P Foley, Student discounts and single issue rates are available from SAGE & Stress P Hedenqvist Publications Ltd, 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP, UK, tel. þ44 (0)20 7324 8500, email [email protected] Anatomy and Neuroscience B Riederer, S Wells (neuro) and in North America, SAGE Publications Inc, PO Box 5096, Aquatic Organisms K Finger-Baier (fish), Thousand Oaks, CA 91320, USA. JP Mocho, M Crim Behaviour D Preissmann, M Gyger, Advertisement Managers L Lewejohann Biostatisics & Experimental R-D Gosselin, H Wu¨ rbel PRC Associates Ltd, 1st Floor Offices, 115 Roebuck Road, Design Chessington, Surrey KT9 1JZ, UK; Tel: +44 (0) 20 8337 3749; Fax: Education P Vergara, C Tho¨ne-Reineke +44 (0) 20 8337 7346; Email: [email protected] Imaging Techniques L van der Weerd, J Tremoleda Laboratory Animals Ltd Large Animal Models M Jensen-Waern, D Anderson, Laboratory Animals Ltd is a company limited by guarantee and has T Morris no share capital. The Memorandum of Association obliges the Management of Animal Facilities J-B Prins, M Dennis company to apply all its resources to the advancement of public Molecular & Genetic Engineering T Ruelicke, P Cinelli Nutrition and Diets G Tobin, T Nortey education in laboratory animal science, technology and welfare. It is a registered charity (Registered Charity Number 261047) and Pathology & Microbiology P Clements, D Salvatori, none of its directors may receive any fee or remuneration. A Bleich Physiology & Clinical Chemistry M Sommers, T Hough Registered Office: Primates G Rainer, P Honess, C Witham Laboratory Animals Ltd, 44 Springfield Road, Horsham, West 3Rs & Ethics G Griffin, A Olsson Sussex, RH12 2PD, UK Reproductive Biology H Hedrich, B Pintado, C Gilbert Council of Management Small Animal Models M Berard, J-B Prins (temporary), S Wells Chairman J-B Prins Surgical Procedures D Bouard, R Tolba Secretary E Weir Systematic Review M Ritskes-Hoitinga, Treasurer J Gregory BS Kousholt L Antunes P Nowlan Toxicology F Rutten K Applebee J Orellana Veterinary Medicine E Rivera, J Sanchez-Morgado, V Baumans B Riederer L Whitfield, M Berard A Ritchie N Kostomitsopoulos N Ezov M Ritkes-Hoitinga Special Issue Microbiota guest editor: C Gilbert A Shortland Axel Kornerup-Hansen and J Guillen S Wells Craig Franklin J Helppi M Wilkinson C Johner Comment and correspondence relating to editorial matters may be sent to the Chairman of the Editorial Board by email: laeditorial@- ß 2020 Laboratory Animals Ltd. Apart from any fair dealing for the sagepub.co.uk; or post: LAL, PO Box 373, Eye, Suffolk, IP22 9BS, purposes of research or private study, or criticism or review, as permitted UK. under the UK Copyright, Designs and Patents Act, 1988, no part of this See also http://www.lal.org.uk publication may be reproduced, stored, or transmitted, in any form or by Laboratory Animals, (ISSN 0023-6772) is published and distributed any means, without the prior permission of the publishers, or in the case bimonthly (February, April, June, August, October, December) in of reprographic reproduction in accordance with the terms of licences both print and electronic form by SAGE Publications Ltd, issued by the Copyright Licensing Agency in the UK, or in accordance 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP, UK. with the terms of licences issued by the appropriate Reproduction Rights Organization outside the UK. Enquiries concerning reproduction outside All manuscripts submitted for publication should the terms stated here should be sent to SAGE at the address. be prepared in accordance with the Guidelines Whilst every effort is made to ensure that no inaccurate or misleading for Authors which can be found online at data, opinion or statement appears in the journal, Laboratory Animals Ltd journals.sagepub.com/home/lan. Please submit your paper wish to make it clear that the data and opinions appearing in the articles online at mc.manuscriptcentral.com/la, and advertisements herein are the responsibility of the contributor or contributions for news items can also be made via advertiser concerned. Accordingly, Laboratory Animals Ltd and their offi- manuscript central. cers and agents accept no liability whatsoever for the consequences of any such inaccurate or misleading data, opinion or statement. Printed in Great Britain ...... 6 AFSTAL GV-SOLAS SECAL Association Franc¸aise des Sciences et Gesellschaft fu¨r Versuchstierkunde Sociedad Espanˇ ola para las Ciencias del Techniques de I’Animal de Laboratoire (Society for Laboratory Animal Science) Animal de Laboratorio President (Spanish Society for Laboratory Animal President Sebastian Paturance Science) Bettina Kraenzlin Vice President President Secretary Elodie Bouchoux Isabel Blanco Gutierrez Nicole Linklater Secretariat: 28, rue Saint Dominique, 75007, Faculty of Biology Vice President Paris, France Philipps University Juan Rodriguez Cuesta (www.afstal.com) Karl-von-Frisch Str. 8 35043 Marburg Secretary Germany Julia M. Samos Jua´rez (www.gv-solas.de) ECLAM Treasurer European College of Laboratory Animal Viviana bisbal Velasco Medicine ILAF Secretariat: c/Maestro Ripoll, 8, President Israeli Laboratory Animal Forum 28006 Madrid, Patricia Hedenqvist President Spain Secretariat: Janet Rodgers, 266 Amir Rosner (www.secal.es) Banbury Road, No. 314 Oxford Secretary OX2 7DL, UK David Castel Neufeld Cardiac Research Institute SGV Sheba Medical Center Schweizerische Gesellschaft fu¨r ESLAV Tel Hashomer 52621 Versuchstierkunde European Society of Laboratory Animal Israel Socie´te´ Suisse pour la Science des Animaux de Veterinarians (www.ilaf.org.il) Laboratoire (Swiss Laboratory Animal Science President Association) Peter Glerup President Honorary Secretary Dr. Birgit Ledermann LASA Massimiliano Bardotti Secretary Laboratory Animal Science Association Dr. med. vet. Andrina Zbinden Honorary Secretary President Frederic Decrock Faculty of Science and Medicine Anne-Marie Farmer University of Fribourg Secretariat Ch. du Muse´e8 Secretary General c/o Decrock, 78,bd Gallieni CH-1700 Fribourg, Switzerland 92130 Issy les Moulineaux, France Miles Maxwell (www.naturalsciences.ch/organisations/sgv) (http://eslav.org) PO Box 524, Hull, HU9 9HE, UK (www.lasa.co.uk) SPCAL FELASA Sociedade Portuguesa de Cieˆ ncias Federation of European em Animais de Laborato´ rio Laboratory Animal Science Associations NVP (Portuguese Society for Laboratory Animal President Nederlandse Vereniging voor Science) Hanna-Marja Voipio Proefdierkunde President President-elect (Dutch Association for Laboratory Animal Isabel Vito´ria Figueiredo Ana Santos Science) Vice-President Past President President Ricardo Afonso Heinz Brandstetter Martje Fentener van Vlissingen Secretary Hon. Secretary Catarina Pinto Reis Jean-Philippe Mocho Secretary Jan Langemans Secretariat: Laborato´rio de Secretariat: PO Box 372, BPRC Farmacologia Eye, IP22 9BR, UK Lange Kleiweg 139 Faculdade de Farma´cia (www.felasa.eu) 2288 GJ Rijswik Largo de D. Dinis The Netherlands 3000 Coimbra (www.proefdierkunde.nl) Portugal (www.spcal.pt)

ssniff

Spezialdiäten

“The Best or Nothing”

Standard feeds – cereal based (chow) for all animal species Special / Purified diets Medicated feed

Perfect Animal – Diet models Purified diets • for every research purpose (e.g. diabetes) Obesity, metabolic syndrome, NIDDM, atherosclerosis, (high fat and/or cholesterol)

Hypertension and chronic kidney disease (± adenine) Chronic liver diseases (steatohepatosis, ASH, NASH) Diets with deficits or excesses of certain nutrients

Test compound diets with

Doxycycline Tamoxifen (flavoured diets)

Customers’ statins

Customers’ compounds for toxicity or PK studies ssniff has been the test manufacturer for several CROs Diets with hormone supplementation (17-βββ-estradiol, DHT)

Spezialdiäten GmbH Ferdinand-Gabriel-Weg 16 batch sizes as small as 1 kg DE-59494 Soest - Germany +49-(0)2921-9658-0 [email protected] / www.ssniff.com ssniff – dedicated to r esearch for decades

24-26 MARCH 2020 EDINBURGH

Register now for AST 2020, the UK's largest animal science and technology conference. The meeting will offer a varied programme with parallel poster and workshops sessions, and a high-quality trade exhibition.

Our well-known and respected meeting partners include; NC3Rs, UAR, RSPCA, UFAW, AAALAC. Keynote Speakers include; Dr Jim Reynolds, Professor Eddie Clutton, Mr Steven Tsui and Professor Paul Flecknell.

The conference will deliver an excellent social programme including an evening to explore Edinburgh at your leisure as well as the opportunity to visit the University of Edinburgh facilities.

REGISTER YOUR ATTENDANCE NOW

WORKING TOGETHER FOR LABORATORY ANIMAL SCIENCE AND TECHNOLOGY

REGISTER TODAY ai15753839473_MAX_LabAnimalS_Feb2020.pdf 1 12/3/19 9:39 AM

Your trusted global partner in life-saving research. TM High Density MAX Rodent Housing

• Increases cage density by 20%

• More cost effective capacity increase vs. new construction/renovation

• Accommodates NexGen MAX reusable & EasyCage MAX disposable cages

• Up to 192 cage capacity / 960 mice per rack

• Automatic or bottle watering available 20% More Cages in the • Works with WiVarium Plus, Wi-Com InSight, Wi-Com Sensus, Same Size IVC Sentinel EAD, and other Allentown Facility Support Solutions

To view a webinar on our MAX High Density Rodent Housing go to www.AllentownInc.com/MAXWebinar

® Allentown, LLC © 2019. All Rights Reserved. WWW.ALLENTOWNINC.COM

Sentinel is a trademarks of Allentown, LLC EAD is a registered trademark of Charles River Laboratories, Inc. fl exible fi lm and rigid isolation solutions for all your containment needs

● Holding Isolators

● Surgical Isolators

● Hypoxic Chambers

● Wall Mounted Transfer Hatches

● Desiccator Cabinets

Wall Mounted Transfer Hatches With Mechanical and Timed Electronic Interlocks

44 Potters Lane Milton Keynes MK11 3HQ tel: 01908 305 725 fax 01908 305 729 email: sales@pfi systems.co.uk www.pfi systems.co.uk Delivering higher standards Environment Enrichment

Enhance your research with stimulating Enrichment products from LBS Biotech... Choose from our extensive product range so your research animals can enjoy an active, stimulating and comfortable environment. DesResTM Rodent Houses, Fun Tunnels, Toys, Balls, Chews, Treats, Bedding, Foraging - just some of our quality assured products, suitable for use in biotechnology conditions.

Contact the experts: Tel: +44 (0)1293 827940 Email: [email protected] www.lbs-biotech.com Explore the Laboratory Animals Handbooks

NEW EDITION!

The Design of Animal Experiments Health and Safety in Laboratory Parasites of Laboratory Animals nd (2 Edition) Animal Facilities Dawn G Owen Reducing the Use of Animals Edited by in Research through Better Margery Wood and Experimental Design Maurice W. Smith Michael Festing

Find out more and buy online at uk.sagepub.com/lahandbooks

Laboratory Animals

The international journal of laboratory animal science, medicine, technology and welfare, Laboratory Animals publishes peer-reviewed original papers and reviews on all aspects of the use of animals in biomedical research. The journal promotes improvements in the welfare or well-being of the animals used, particularly focusing on research that reduces the number of animals used or which replaces animal models with in vitro alternatives.

Read the latest content at journals.sagepub.com/home/lan

journals.sagepub.com/home/lan

N8J2049 LAN Flyer Update.indd 1 09/01/19 5:14 PM

Charlost tte 71 AALAS NATIONAL MEETING OCTOBER 25 - 29, 2020

Join us for the 71st AALAS National Meeting in Charlotte, North Carolina. Each fall since 1950, the American Association for Laboratory Animal Science has held its annual National Meeting. During the five days of the meeting, members and nonmembers come together to enjoy the workshops, lectures, poster sessions, and exhibits. The program is designed to have topics relevant to the entire membership. Exhibitors have an opportunity to interact with AALAS members from the academic community, research institutions, government organizations, and commercial companies. The AALAS National Meeting is the largest gathering in the world of professionals concerned with the production, care, and use of laboratory animals.

American Association for Laboratory Animal Science Phone: (901) 754 - 8620 Fax: (901) 753 - 0046 [email protected] www.aalas.org

JOIN US IN CHARLOTTE, NC OCTOBER 25 - 29, 2020

Untitled-1 1 1/17/2020 9:10:54 AM conventional systems • air flow systems • enrichment • enclosure systems • transport systems • cleaning systems • conventional systems

Passion for quality.

More than 70 years experience as one of the leading manufacturers and suppliers of complete solutions for scientific animal husbandry.

An example: Enclosure systems for keeping marmosets usually consist of several single cage modules which are mounted together and can be modified into larger units with very little effort – the modular arrangement system by ZOONLAB. ZOONLAB – for your work with small and large animals. Not only the vast product range but also decades of experience, first-class quality and innovative and individual product solutions are reasons why customers from all over the world rely on ZOONLAB. Learn more about us – we are looking forward to your enquiry: [email protected] or +49 / 23 05 / 97 30 40

ZOONLAB GmbH Hermannstraße 6 | 44579 Castrop-Rauxel | Germany www.zoonlab.de Editorial Laboratory Animals 2020, Vol. 54(1) 16 ! The Author(s) 2019 Article reuse guidelines: Severity Assessment in animal sagepub.com/journals-permissions DOI: 10.1177/0023677219898105 based research journals.sagepub.com/home/lan

Andre Bleich1, Marion Bankstahl1, Paulin Jirkof2, Jan-Bas Prins3 and Rene H Tolba4

Animal based research needs to strictly adhere to the overcome this situation. Insight is needed from various 3R principle (replace, refine, reduce), not only for the fields to gain model-specific methods for assessing ethical justification of the use of animals in science, but severity that can be applied in routine settings. These also to ensure the highest quality of data due to stand- methods shall provide objective and gradable param- ardization. With the implementation of the EU eters that enable the correlation of test results with Directive 2010/63 within national law it is mandatory severity grades, for example, those defined by EU regu- to perform a ‘severity classification of procedures on lations. Existing methods are to be validated and have the basis of estimated levels of pain, suffering, distress to be refined; new methods have to be developed. In and lasting harm that is inflicted on the animals’. this context, minimal or non-invasive surveillance and However, scientifically sound scales as well as imaging approaches will be indispensable. Based on this broadly accepted and applicable model-specific meas- knowledge, research models and procedures will be ures to grade this in routine work are lacking. This refined to minimize discomfort by identifying and deficit in the research landscape impacts ethical consid- approaching critical stress factors. As stress also con- erations as well as the quality of animal based research tributes to variation, detecting these critical factors will data. In addition, the discrepancy of current regula- also help to better standardize animal models and to tions and scientific knowledge momentarily hinders enhance quality of research data based on animal biomedical research that still requires testing hypoth- experimentation. eses in animals. We do hope with this Special Issue we can foster a Therefore, the current Special Issue ‘Severity discussion on Severity Assessment and give the readers Assessment’ contains reports from relevant research an insight into current research topics. projects and networks that were implemented to

1Institute for Laboratory Animal Science and Central Animal Facility, Hannover Medical School, Germany 2Department of and 3Rs, University of Zurich, Switzerland 3Biological Research Facility, Francis Crick Institute, London, UK 4Institute for Laboratory Animal Science and Experimental Surgery, RWTH Aachen University, Germany

Corresponding author: Rene Tolba, Institute for Laboratory Animal Science and Experimental Surgery, RWTH Aachen University, Pauwels Str. 30, 52074 Aachen, Germany. Email: [email protected] Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 17–25 ! The Author(s) 2019 Nest-building performance in rats: Article reuse guidelines: sagepub.com/journals-permissions impact of vendor, experience, and sex DOI: 10.1177/0023677219862004 journals.sagepub.com/home/lan

Kerstin Schwabe1,*, Lena Boldt2,*, Andre´ Bleich3 , Roelof Maarten van Dijk2, Simeon Oscar Arnulfo Helgers1, Christine Ha¨ger3 , Marta Nowakowska2, Ann-Kristin Riedesel1, Katharina Scho¨nhoff2, Birgitta Struve3,Ju¨rgen Wittek1 and Heidrun Potschka2

Abstract Nest building behavior has been intensely applied as a parameter for severity assessment in mice. In contrast, only a limited number of studies have reported nest building data from rats. Here, we assessed nest building in rats in two different facilities addressing the hypotheses that the vendor, previous experience with the nesting material as well as sex of the rats has an impact on the performance. Data from two study sites and three raters were compared to obtain information about the robustness of nest complexity scoring. The findings demonstrate a generally poor nest building performance in rats with a pronounced day-to-day fluctuation, and site-specific differences. Application of a newly developed scoring system resulted in an intermediate inter-rater reliability. Previous experience with the nesting material did not exert a consistent impact on nest complexity scores. Sex differences proved to depend on vendor and animal facility without consistent findings supporting a superior performance in female or male rats. In conclusion, our findings argue against a robust and consistent influence of sex and familiarity with the nesting material. The com- parison between facilities suggests that local conditions need to be considered as influencing factors, which should be explored in more detail by future multicenter approaches. Considering the day-to-day fluctuation and the intermediate inter-rater reliability, we highly recommend to base nest complexity evaluation on means from several subsequent days analyzed by a group of experienced raters.

Keywords nest complexity, Enviro-driÕ, severity assessment, behavior, rats

Date received: 18 March 2019; accepted: 14 June 2019

Nest building behavior has been intensely validated as a nest building behavior is not genetically determined, well-being parameter in laboratory mice.1–4 This species but needs to be learned.17 To confirm this hypothesis shows a strong natural motivation to engage in nest they compared nest building in female and male Wistar building activity, probably to create a cage subarea as a shelter with an optimized microclimate concerning 1Department of Neurosurgery, Hanover Medical School, Germany temperature and light exposure. Assessment of nest 2Institute of Pharmacology, Toxicology, and Pharmacy, Ludwig- complexity and level of soiling revealed that these par- Maximilians-University Munich, Germany 3Institute for Laboratory Animal Science and Central Animal ameters can be significantly affected by distress and Facility, Hanover Medical School, Germany pain in mice, and can therefore be useful for severity 1,2,5–13 assessment. *shared first authorship In rats it has been reported that although rats pre- ferred cages with nesting material,14–16 they did not Corresponding author: Heidrun Potschka, Institute of Pharmacology, Toxicology, and construct complex nests when coarse paper strips Pharmacy, Ludwig-Maximilians-University, Koeniginstr. 16, Õ 16 (Enviro-dri ) were offered as enrichment. This led D-80539 Munich, Germany. van Loo and Baumans to hypothesize that in rats Email: [email protected] 18 Laboratory Animals 54(1) rats (U:WU) with a different history of exposure to two nesting material is conventional pulp (Tork Standard different types of nest material (KleenexÕ tissues or Papierwischtu¨cher two-layer, Mannheim, Germany). Enviro-driÕ). Data from this study revealed that the These rats were tested either at MHH or LMU, in the older the rats at first exposure to Enviro-driÕ, the following termed MHH/CR/pulp and LMU/CR/pulp. poorer the nest building performance, although For the ‘‘familiar’’ group, rats that were raised with the nest-building performance improves over time.17 the nesting material Enviro-driÕ from birth onward Despite this report in 2004, which suggested an were either obtained from Envigo (ENV) at LMU or importance of young in rats and recommended bred in-house as F1 from SD rats purchased from CR nesting material for enrichment in rats, and despite the at MHH. These groups were termed LMU/ENV/Enviro fact that nest building has been applied as a well-being or MHH/CR-F1/Enviro. parameter in mice, only few studies have assessed nest At LMU, in total 20 virgin female and 20 virgin male building behavior in rats except from studies focusing Crl:CD (SD) rats were investigated. Male (n ¼ 10) on maternal behavior.18,19 This might be related to and female (n ¼ 10) rats were obtained from Envigo problems with nest building performance in rats. (the Netherlands) and CR at approximately 12 weeks In line with this assumption, we also faced respective old. Rats were pair-housed by sex and vendor, i.e. problems, when we initiated studies with nest building familiarity with nesting material. assessment in a research consortium focused on evi- At MHH, in total 28 virgin female and 26 virgin male dence-based severity assessment in rats. To explore pos- Crl:CD (SD) rats of approximately 12 weeks were used sible reasons for these difficulties we assessed nest with 14 female and 12 male rats from CR, and 14 female building in rats in two different facilities in a systematic and 14 male rats from the F1 generation (parental gen- manner. We addressed the hypotheses that the vendor, eration from CR) bred in the Central Animal Facility of previous experience with the nesting material, as well MHH. Rats were pair-housed by sex and familiarity as sex of the rats have an impact on the nest build- with nesting material. ing performance. In addition, we compared data For additional information regarding the animal between study sites and between raters to obtain infor- husbandry, see supplemental material. mation about the robustness of nest complexity scor- All animals were housed for other studies or training ing. For these analyses, Sprague Dawley rats have been purposes, as approved by the government of Upper selected as the most frequently used rat breed Bavaria (license number ROB-55.2-2532.Vet_03-15-11 worldwide. and ROB-55.2-2532.Vet_02-14-120) and the Lower Saxony State Office for Consumer Protection and Materials and methods Food Safety, LAVES (AZ 16/2315, 15/1933 and 17/2477). All investigations were conducted in line with the Rats were housed in pairs and received 28 g of German Animal Welfare act and the EU directive Enviro-driÕ once weekly upon cage cleaning. Nesting 2010/63/EU. The health status of the animal facilities material was placed in the back left corner of the cage. of the two institutes met the FELASA guidelines.20 The amount of 28 g was chosen based on the outcome of pilot studies testing 14, 21, and 28 g for pair-housed Animals and experimental design rats. The placement of the material was standardized for all facilities with the only criterion for our decision Researchers at Ludwig-Maximilians-University, that the corner should be located opposite to the feed- Munich (LMU) and Hanover Medical School (MHH) ing rack. performed experiments to investigate the influence of (i) vendor, (ii) familiarity with the nesting material Enviro- Nest complexity scoring driÕ (Claus GmbH, Limburgerhof, Germany), and (iii) sex on nest building performance. The choice of the Every morning between 08:00 and 09:00 photos of the nesting material has been based on pilot studies and nests were taken for image-based scoring, including at previous experience of the research consortium.21–23 least one side view and one top down view. These Also based on pilot data, a power analysis was carried photos were then arranged as one photo assembly for out and revealed a necessary group size of n ¼ 5 pairs. each pair of rats and day. This minimum number was considered in all subgroups For scoring of the nest building behavior we first of animals. followed a score modified according to van Loo and For the ‘‘naı¨ve’’ group (i.e. unfamiliar to Enviro- Baumans (see Table 1).17 After analysis of the scores driÕ), rats were obtained from Charles River by two raters, which showed tremendous disparities, a Laboratories (Sulzfeld, Germany; CR) where the discussion among the collaborating groups revealed Schwabe et al. 19

Table 1. Nest scoring scheme.

Old score New score

Score 0 the nest material is almost untouched the nest material is almost untouched or distributed throughout the cage Score 1 a nesting area is clearly visible, the the nest material is touched and distributed over more nest is flat than half of the cage’s floor area without a visible nesting area Score 2 the nest has a slightly dented shape a nesting area is clearly visible, which is smaller than half of the cage base area (i.e. less than 1/5 cage Type IV Makrolon cage), the nest is flat with no visible indentation, the edges of the nest may have a frayed shape Score 3 the nest is deep or caved the nest area is clearly defined and delineated and the nest has either an appreciable height (at least 1/5 of the cage height) with a minimal indentation or is flat, but with a prominent indentation Score 4 the nest has both an appreciable height (at least 1/4 of the cage height) and a prominent indentation

that the nest complexity and shapes were not described days (days 1–4) following arrival at LMU (referred to in unambiguous detail by the score levels. Review of as week 0). nest complexity scores in mice indicated the use of 2,24,25 more detailed grading systems with 4–6 levels. Statistical analysis Following this concept, a more detailed scoring system was developed (Figure 1, Table 1). With this score, the GraphPad Prism (Version 5.04; GraphPad, La Jolla, complexity of 120 nests (based on two to three photos CA, USA) and R version 3.3.2 were used for statistical per nest) from the different animal facilities was analysis.26 A two-way analysis of variance (ANOVA) described in detail by two experienced raters, which sub- with a Bonferroni post-hoc test was used for the com- sequently agreed on what score should be given and how parison of different familiarity with nesting material this score should be described best. and sex, and for the comparison of vendor and sex The robustness of this scoring system was tested by for days 1–4 after arrival at LMU. A repeated Three- three raters (referred to as raters 1–3) from both insti- way-ANOVA followed by a post-hoc test adjusted for tutes not aware of the group allocation (including his- multiple testing using a Benjamini and Hochberg cor- tory of nesting material, vendor, and sex) and origin of rection was used for comparing the effect of site, sex the animals. Raters were first provided with the training and the different time windows within animals from the set of 120 nests described above. The training set had to same vendor (CR). Data represent mean with standard be passed with an accuracy level of at least 80%. deviation (SD) of the median of days 4–6 of all raters Thereafter, all raters analyzed a comprehensive set of (main figures) or for individual raters (see supplemen- 861 nests from LMU and MHH. tary figures). Mean nest scores were calculated based on The elaborate score has been applied for all data sets average nest scores of days 4–6. R version 3.3.2 was provided and discussed in this manuscript. Please note used to create the timelines in Supplementary figure that the data sets from individual raters are illustrated 1.26 The R package ‘‘irr’’ was used to calculate the in Supplementary figures 2–4. intraclass correlation coefficient (ICC) and Kendall’s The evaluation was focused on weeks two and three coefficient of concordance W.27 Significant differences following arrival in the animal facility or postnatal between groups are shown as asterisks (* p < 0.05; trend weeks 10–12 in the F1 generation reared at MHH is shown as asterisk in brackets (*) < 0.01). (referred to as weeks 1 and 2 of evaluation). Animals at MHH that were purchased from CR (i.e. not reared at MHH with Enviro-driÕ) were not exposed to nest Results material during the first week after arrival. Animals at Course of nest complexity scores LMU received nesting material from the day of their arrival. Therefore, it was possible to perform an add- Following exposure to new nest material, the daily itional analysis of nest building performance at the first scores increased at least to some extent during testing, 20 Laboratory Animals 54(1)

Figure 1. New nest score. A: score 0 ¼ the nest material is almost untouched. B: score 1 ¼ nest material is touched and distributed over more than half of the cage’s floor area without a visible nesting area. C: score 2 ¼ a nesting area is clearly visible, which is smaller than half of the cage base area (i.e. less than 1/5 cage Type IV Makrolon cage), the nest is flat with no visible indentation, the edges of the nest may have a frayed shape. D: score 3 ¼ the nest area is clearly defined and delineated and the nest has either an appreciable height (at least 1/5 of the cage height) with a minimal indentation or is flat but with a prominent indentation. E: score 4 ¼ the nest has both an appreciable height (at least 1/4 of the cage height) and a prominent indentation. but showed a relatively high level of fluctuation Comparison between sites, origin, and sex (Supplementary figure 1). Mean nest scores of days 4–6 of each week were used to compare overall performance; A full comparison of nest building scores was possible the selection of these days was based on previous experi- using the CR groups, where data is available for both ments inwhich it was observed that nest quality plateaued sites (Figure 2), both sexes, and the two subsequent following day 4 and often decreased again at day 7.22 weeks following introduction of the nesting material. The three-way ANOVA indicated an overall significant Inter-rater reliability scores difference between the two sites (F(1,37) ¼ 5.884, p ¼ 0.0203) and an overall difference between weeks 1 Using a scoring scheme modified according to and 2 (F(1,37) ¼ 5.328, p ¼ 0.0267). Individual compari- van Loo and Baumans,17 the different raters reached sons revealed that a difference between sites was only an inter-rater reliability score, as measured by the ICC evident for female rats reaching higher nest complexity of 0.54 (p < 0.001) and measured by the Kendall’s coef- scores at LMU compared with MHH in week 2 ficient of concordance W of 0.60 (p ¼ 0.0475). Using (p ¼ 0.0141). Further comparisons confirmed a signifi- the more detailed training and scoring scheme cant difference between weeks 1 and 2 (p ¼ 0.0449) in the inter-rater reliability score reached an ICC of 0.79 females at LMU. Lastly the only sex difference found (p < 0.001) and Kendall’s W of 0.63 (p < 0.001). Results was between the MHH groups in the second week, with from individual raters can be found in Supplementary males building more complex nests as compared with figures 2–4. females (p ¼ 0.0380). Schwabe et al. 21

Figure 2. Comparison of nest scores of Charles River rats at two different sites (LMU and MHH), two sexes and two time points. Shown are the mean nest scores for days 4–6 (+/- SD) of each week after receiving new nest material. Significant differences between groups are indicated by asterisks (three-way ANOVA with post-hoc test adjusted for multiple testing using Benjamini and Hochberg; * p < 0.05).

Figure 3. Mean nest score comparison of vendor and sex for groups tested either at LMU (a) or at MHH (b). Shown are the mean nest scores for days 4–6 (+/- SD) after receiving new nest material in second week of nest scoring. At LMU rats from Envigo (LMU/ENV/Enviro) were compared with rats from Charles River (LMU/CR/pulp) (a), at MHH rats from Charles River reared at MHH (MHH/CR-F1/Enviro) were compared with rats from Charles River (MHH/CR/pulp) (b). Significant differ- ences between groups are indicated by asterisks (two-way ANOVA with a Bonferroni post-hoc test; * p < 0.05).

Scores of individual raters were characterized by a ENV/Enviro male: p < 0.1). This was confirmed by relatively high level of variation, as already indicated by mean scores of all raters (Figure 3(a), LMU/ENV/ ICC analyses. Thereby, results of raters 2 and 3 proved Enviro female against LMU/ENV/Enviro male: to be more consistent than results of rater 1 p < 0.05). Surprisingly, statistical analysis revealed a (Supplementary figure 2(a) to (c)). significant higher complexity of nests in males as com- Comparison of rats at LMU did not reveal signifi- pared with females in the MHH/CR/pulp group cant differences between animals supplied by Envigo (Figure 3(b) and Supplementary figure 3(d) to (f), sex and CR (Figure 3(a)). In order to test whether early (F(1,23) ¼ 5.004, p ¼ 0.035), MHH/CR/pulp female exposure to Enviro-driÕ has an impact on nest building against MHH/CR/pulp male: p < 0.05). performance in animals of the same breed, nest scores were compared between animals from CR (MHH/CR/ Õ Course of nest complexity following arrival pulp) and animals reared with Enviro-dri (MHH/CR- or first exposure to nest material F1/Enviro) at MHH. The respective comparison did not indicate a significant difference (Figure 3(b), In the animal facility at LMU, animals were offered Supplementary figure 3(d) to (f)). Direct comparison nest material immediately following arrival in accord- between females and males only revealed a tendency ance with the local regulations. During the first days of for better nest building in females as compared with exposure (mean scores of days 1–4 – week 0) a pro- males for two of the three raters (Supplementary figure nounced group difference proved to be evident with 3(a) and (b), LMU/ENV/Enviro female against LMU/ nest complexity scores of female SD rats from 22 Laboratory Animals 54(1)

assessment parameter. Nevertheless, nest building in female SD rats purchased from Envigo, i.e. rats with the highest nest complexity scores in the present study, has been previously reported as suitable for severity assessment in a rat model for epilepsy.22 Another drawback of nest scoring seems to be the high inter-rater variability, which constitutes a general problem of ‘‘qualitative scoring systems’’. With the application of a simple scoring system following the description by van Loo and Baumans,17 we observed tremendous discrepancies in the scores given. Thus, a more detailed scoring system was developed, which, together with a training set, improved the inter-rater reliability at least to some extent. Nevertheless, even Figure 4. Comparison of mean nest scores for days 1-4 this more detailed grading system resulted in rather after arrival at LMU Munich for male and females of Envigo intermediate levels of inter-rater reliability, which poses (LMU/ENV/Enviro) and Charles River (LMU/CR/pulp) pairs. problems with regard to the reproducibility of data sets. Increased nest scores were found in female rats from 2 Envigo as compared with all other groups. Significant dif- Therefore, in line with a statement by Jirkof , we rec- ferences between groups are indicated by asterisks (two- ommend to carefully consider inter-rater variability and way ANOVA with a Bonferroni post-hoc test; * p < 0.05). to use the mean values of scores provided by a group of raters in order to guarantee a higher robustness of find- Envigo, i.e. the LMU/ENV/Enviro group, exceeding ings. Moreover, considering the progressive increase scores in all other groups by rater 1 (Figure 4 and during the first days and day-to-day fluctuations, we Supplementary figure 4 for raters 1–3, interaction strongly recommend to use the mean of days with (F(1,16) ¼ 25.39, p < 0.001); sex (F(1,16) ¼ 7.087, rather stable nest complexity. p ¼ 0.017); vendor (F(1,16) ¼ 17.5, p < 0.001), LMU/ In mice a high level of motivation for nest building ENV/Enviro female against all others p < 0.001). has generally been reported in both sexes.28,29 In rats, Shown are the mean nest scores for days 4–6 (+/- SD). previous studies also did not find differences between males and females, although often only one sex was 15,17,30,31 Discussion tested. Overall, we also found no consistent differences between males and females, although nest To explore the suitability of nest complexity scoring as complexity scores at different locations may differ a basis for severity assessment in rats we compared nest between sexes. Male SD rats purchased from CR con- building performance of virgin male and female pair- structed more complex nests than female rats at MHH, housed SD rats from different vendors and with differ- whereas nest building performance was better in female ent familiarity with the nesting material Enviro-driÕ in than in male rats from Envigo at LMU. Thus, relevant two animal facilities. When considering that only differences between sexes can occur which should be scores 3 indicate a more complex nest structure, it is considered in the study design. notable that respective median score levels were only It has been suggested that the nest building perform- rarely reached in the facility of MHH, and only on ance of rats largely depends on their previous experi- single days at LMU. Only for female SD rats purchased ence and that exposure to a specific nest material during from Envigo and kept at LMU the median nest score their development results in an improved nest building determined at the majority of testing days reached performance later on.17 This theory seems to be con- levels above 3. These data indicate a generally poor firmed by highest nest complexity scores observed in nest building performance of rats in line with previous female SD rats from Envigo that grew up with descriptions by Manser and colleagues.15 Enviro-driÕ as nesting material. However, nest building In mice different experimental paradigms that cause in SD rats purchased from CR, which grew up with distress and pain can result in a reduction of nest com- standard pulp as nesting material, did not differ from plexity scores.1,2,5–13 Therefore, Jirkof suggested that those of the F1 offspring raised up with Enviro-driÕ at alterations in nest building behavior and changes in MHH, e.g. rats with the same genetic background. This burrowing behavior along with specific disease markers argues against a relevance of previous exposure to the might serve as a valuable indicator of well-being in specific nest material. In fact, the maximum individual mice.2 The poor nest building performance in rats, nest scores in male SD rats at MHH were even higher in however, raises doubt on a comparable suitability of those animals that grew up at CR without exposure to nest building behavior as a generalizable severity Enviro-driÕ. Schwabe et al. 23

Taken together, these data do not support the assistance and Ute Lindauer, Annika Bach, and Ekaterina ‘‘importance of learning young’’ as suggested by van Harder for their input and support in developing the Loo and Baumans.17 On the other hand, the progressive manuscript. increase of nest building performance observed in female SD rats from CR at LMU indicate that the impact of the Declaration of Conflicting Interests history and experience should not be neglected, although The author(s) declared no potential conflicts of interest with the level of its impact seems to depend on the local hus- respect to the research, authorship, and/or publication of this bandry conditions, sex of the animals, and other factors. article. Finally, an inter-site comparison between SD rats supplied by CR showed higher nest building perform- Funding ance especially of female SD rats at LMU despite the The author(s) disclosed receipt of the following financial sup- fact that the room climate and light conditions have port for the research, authorship, and/or publication of this been set to comparable standards according to article: The study was funded in part by the German Research European guidelines in all animal facilities with con- Foundation (Deutsche Forschungsgemeinschaft - DFG) FOR tinuous daily control. Although rats have a better 2591 Consortium. Reference numbers are listed in the supple- body surface to weight relation, and therefore less pro- mental material. nounced needs in terms of support for thermoregula- tion, an impact of the surrounding air temperature on ORCID iDs nest building activity in rats has already been described Andre´Bleich https://orcid.org/0000-0002-3438-0254 in 1927.32 Moreover, studies in mice indicated a major Christine Ha¨ger https://orcid.org/0000-0002-6971-9780 influence of the room temperature.33 Thus, we specu- Heidrun Potschka https://orcid.org/0000-0003-1506-0252 late that differences in the microclimate related to the local technical system might have resulted in differences References in the felt temperature with an impact on the motiv- 1. Gaskill BN, Karas AZ, Garner JP, et al. Nest building as ation for nest building. In this context, it is of interest an indicator of health and welfare in laboratory mice. that the air exchange rates at LMU were considerably J Vis Exp 2013; 82: e51012. 2. Jirkof P. Burrowing and nest building behavior as indi- higher than at MHH. Differences in the distance of the cators of well-being in mice. J Neurosci Methods 2014; transport from the commercial vendor to the animal 234: 139–146. facility, as well as in the handling of animals by per- 3. Hager C, Keubler LM, Biernot S, et al. Time to integrate sonnel in the facilities may also constitute important to nest test evaluation in a mouse DSS-colitis model. variables that can significantly influence behavioral pat- PLoS One 2015; 10: e0143824. terns. To determine influencing factors on the robust- 4. Rock ML, Karas AZ, Rodriguez KB, et al. The time- ness of nest building data it may be of future interest to to-integrate-to-nest test as an indicator of wellbeing in assess nest building performance in rodents in a multi- laboratory mice. J Am Assoc Lab Anim Sci 2014; 53: center study, which has been shown to be a valuable 24–28. approach to gain information about robustness, repro- 5. Jirkof P, Cesarovic N, Rettich A, et al. Individual hous- ducibility, and confounding factors.34,35 ing of female mice: influence on postsurgical behaviour In conclusion, our findings demonstrate a generally and recovery. Lab Anim 2012; 46: 325–334. 6. Jirkof P, Fleischmann T, Cesarovic N, et al. Assessment poor nest building performance in virgin pair-housed of postsurgical distress and pain in laboratory mice by rats as compared with mice. The data argue against a nest complexity scoring. Lab Anim 2013; 47: 153–161. robust and consistent influence of sex and of familiarity 7. Otabi H, Goto T, Okayama T, et al. Subchronic and mild of the specific nesting material. The comparison social defeat stress alter mouse nest building behavior. between facilities suggests that local conditions need Behav Processes 2016; 122: 21–25. to be considered as influencing factors, which should 8. Greenberg GD, Huang LC, Spence SE, et al. Nest build- be explored in more detail by multicenter approaches. ing is a novel method for indexing severity of alcohol The day-to-day fluctuation of scores and the inter- withdrawal in mice. Behav Brain Res 2016; 302: 182–190. mediate inter-rater reliability results in the recommen- 9. Lerch S, Dormann C, Brandwein C, et al. The scent of dation to base nest complexity scoring on means from stress: environmental challenge in the peripartum envir- several subsequent days analyzed by a group of experi- onment of mice affects emotional behaviours of the adult offspring in a sex-specific manner. Lab Anim 2016; 50: enced raters. 167–178. 10. Moore ES, Cleland TA, Williams WO, et al. Comparing Acknowledgments phlebotomy by tail tip amputation, facial vein puncture, The authors thank Sieglinde Fischlein, Verena Buchecker, and tail vein incision in C57BL/6 mice by using physio- Sabine Vican, Katharina Gabriel, Uwe Birett, Monika von logic and behavioral metrics of pain and distress. JAm Iterson, and Daniel Ahrens for their excellent technical Assoc Lab Anim Sci 2017; 56: 307–317. 24 Laboratory Animals 54(1)

11. Hohlbaum K, Bert B, Dietze S, et al. Severity classifica- micro-positron emission tomographic study in a rat elec- tion of repeated isoflurane in C57BL/6JRj trical post-status epilepticus model. Epilepsia 2018; 59: mice: assessing the degree of distress. PLoS One 2017; 2194–2205. 12: e0179588. 24. Deacon RM. Assessing nest building in mice. Nat Protoc 12. Harikrishnan VS, Hansen AK, Abelson KS, et al. 2006; 1: 1117–1119. A comparison of various methods of blood sampling in 25. Hess SE, Rohr S, Dufour BD, et al. Home improvement: mice and rats: effects on animal welfare. Lab Anim 2018; C57BL/6J mice given more naturalistic nesting materials 52: 253–264. build better nests. J Am Assoc Lab Anim Sci 2008; 47: 13. Hohlbaum K, Bert B, Dietze S, et al. Systematic assess- 25–31. ment of well-being in mice for procedures using general 26. R Core Team. R: a language and environment for statis- anesthesia. J Vis Exp 2018; 133: e57046. tical computing. R Core Team, 2013. 14. Bradshaw AL and Poling A. Choice by rats for enriched 27. Gamer M, Lemon J, Gamer MM, et al. Kendall’s versus standard home cages: plastic pipes, wood plat- WJVcoir and agreement. Package ‘irr’, 2012. forms, wood chips, and paper towels as enrichment 28. Sherwin CM. Observations on the prevalence of nest- items. J Exp Anal Behav 1991; 55: 245–250. building in non-breeding TO strain mice and their use 15. Manser CE, Broom DM, Overend P, et al. Investigations of two nesting materials. Lab Anim 1997; 31: 125–132. into the preferences of laboratory rats for nest-boxes and 29. Lisk RD, Pretlow RA 3rd and Friedman SM. Hormonal nesting materials. Lab Anim 1998; 32: 23–35. stimulation necessary for elicitation of maternal nest- 16. Manser CE, Broom DM, Overend P, et al. Operant stu- building in the mouse (Mus musculus). Anim Behav dies to determine the strength of preference in laboratory 1969; 17: 730–737. rats for nest-boxes and nesting materials. Lab Anim 1998; 30. Moussaoui N, Larauche M, Biraud M, et al. Limited 32: 36–41. nesting stress alters maternal behavior and in vivo intes- 17. Van Loo PL and Baumans V. The importance of learning tinal permeability in male Wistar pup rats. PLoS One young: the use of nesting material in laboratory rats. Lab 2016; 11: e0155037. Anim 2004; 38: 17–24. 31. Sun M, Huang P, Wang Y, et al. Anticonvulsants 18. Reis AR, de Azevedo MS, de Souza MA, et al. Neonatal lamotrigine and riluzole disrupt maternal behavior in handling alters the structure of maternal behavior and postpartum female rats. Pharmacol Biochem Behav affects mother-pup bonding. Behav Brain Res 2014; 265: 2018; 168: 43–50. 216–228. 32. Kinder EF. A study of the nest-building activity of the 19. Boero G, Biggio F, Pisu MG, et al. Combined effect of albino rat. J Exp Zool 1927; 47: 117–161. gestational stress and postpartum stress on maternal care 33. Gaskill BN, Gordon CJ, Pajor EA, et al. Heat or insula- in rats. Physiol Behav 2018; 184: 172–178. tion: behavioral titration of mouse preference for warmth 20. Guillen J. FELASA guidelines and recommendations. or access to a nest. PLoS One 2012; 7: e32799. J Am Assoc Lab Anim Sci 2012; 51: 311–321. 34. Wodarski R, Delaney A, Ultenius C, et al. Cross-centre 21. Di Liberto V, van Dijk RM, Brendel M, et al. Imaging replication of suppressed burrowing behaviour as an correlates of behavioral impairments: an experimental ethologically relevant pain outcome measure in the rat: PET study in the rat pilocarpine epilepsy model. a prospective multicentre study. Pain 2016; 157: Neurobiol Dis 2018; 118: 9–21. 22. Moller C, Wolf F, van Dijk RM, et al. Toward evidence- 2350–2365. based severity assessment in rat models with repeated 35. Mandillo S, Tucci V, Holter SM, et al. Reliability, seizures. I. Electrical kindling. Epilepsia 2018; 59: robustness, and reproducibility in mouse behavioral phe- 765–777. notyping: a cross-laboratory study. Physiol Genomics 23. van Dijk RM, Di Liberto V, Brendel M, et al. 2008; 34: 243–255. Imaging biomarkers of behavioral impairments: a pilot

Re´sume´ Le comportement de construction de nid a e´te´ utilise´ intense´ment comme parame`tre pour l’e´valuation de la gravite´ chez les souris. En revanche, seul un nombre limite´ d’e´tudes a rapporte´ des donne´es de construction de nids en ce qui concerne les rats. Nous avons donc ici e´value´ la construction de nids chez les rats dans deux installations diffe´rentes selon l’hypothe`se que le vendeur, l’expe´rience ante´rieure avec le mate´riel de nidi- fication ainsi que le sexe des rats ont un impact sur la performance. Les donne´es de deux sites d’e´tude et de trois e´valuateurs ont e´te´ compare´es pour obtenir des informations sur la robustesse de l’e´valuation de la complexite´ du nid. Les re´sultats de´montrent une performance de construction du nid ge´ne´ralement me´diocre chez le rat avec d’importantes fluctuations au jour le jour, et des diffe´rences propres a` chaque site. L’application d’un nouveau syste`me de notation a abouti a` une fiabilite´ interme´diaire inter-e´valuateur. L’expe´rience ante´rieure avec le mate´riel de nidification n’a pas eu d’influence syste´matique sur les scores de complexite´ du nid. Les diffe´rences entre les sexes se sont ave´re´es de´pendre du vendeur et de l’installation animalie`re, sans re´sultats concordants soutenant une performance supe´rieure chez le rat maˆle ou femelle. Schwabe et al. 25

En conclusion, nos re´sultats vont a` l’encontre d’une influence robuste et cohe´rente du sexe et de la con- naissance du mate´riel de nidification. La comparaison entre les installations sugge`re que les conditions locales doivent eˆtre conside´re´es comme des facteurs d’influence, qui devraient eˆtre e´tudie´s plus en de´tail par de futures approches multicentriques. Compte tenu de la fluctuation au jour le jour, et de la fiabilite´ interme´diaire inter-e´valuateur, nous recommandons fortement de baser l’e´valuation de la complexite´ des nids sur des donne´es recueillies sur plusieurs autres jours et analyse´es par un groupe d’e´valuateurs expe´rimente´s.

Abstract Nestbauverhalten wird in großem Maße als Parameter fu¨r die Belastungsgrad-Einscha¨tzung bei Ma¨usen genutzt. Im Gegensatz dazu existiert nur eine begrenzte Anzahl von Studien u¨ber Nestbau-Daten von Ratten. In Rahmen der vorliegenden Studie bewerteten wir den Nestbau bei Ratten in zwei verschiedenen Einrichtungen unter Aufgreifen der Hypothesen, dass der Anbieter, fru¨here Erfahrungen mit dem Nestmaterial sowie das Geschlecht der Ratten Einfluss auf die Leistung haben. Es wurden Daten von zwei Studienorten und drei Bewertern verglichen, um Informationen u¨ber die Solidita¨t des Nestkomplexita¨ts- Scoring (Bewertung) zu erhalten. Die Ergebnisse zeigen eine allgemein schlechte Nestbauleistung bei Ratten mit ausgepra¨gten Schwankungen zwischen einzelnen Tagen und standortspezifischen Unterschieden. Mit der Anwendung eines neu entwickelten Scoring-Systems wurde eine mittlere Interrater-Reliabilita¨t erzielt. Fru¨here Erfahrungen mit dem Nestmaterial hatten keinen konstanten Einfluss auf die Nestkomplexita¨t-Scores. Geschlechtsunterschiede erwiesen sich als abha¨ngig vom Anbieter und der Tiereinrichtung ohne konsistente Ergebnisse, die eine bessere Leistung bei weiblichen oder ma¨nnlichen Ratten belegen wu¨rden. Zusammenfassend la¨sst sich sagen, dass unsere Ergebnisse gegen einen robusten und durchga¨ngigen Einfluss von Geschlecht und Vertrautheit mit dem Nestmaterial sprechen. Der Vergleich zwischen den Einrichtungen legt nahe, dass die o¨rtlichen Bedingungen als Einflussfaktoren gelten mu¨ssen, die durch zuku¨nftige multizentrische Ansa¨tze na¨her untersucht werden sollten. In Anbetracht der ta¨glichen Schwankungen und der mittleren Interrater-Reliabilita¨t empfehlen wir dringend, die Bewertung der Nestkomplexita¨t auf Mittelwerte von mehreren aufeinander folgenden Tagen zu stu¨tzen, die von einer Gruppe erfahrener Bewerter analysiert wurden.

Resumen El comportamiento a la hora de fabricar nidos ha sido aplicado intensamente como un para´metro para la evaluacio´n de la severidad en roedores. En cambio, u´nicamente un nu´mero limitado de estudios han arrojado datos sobre la fabricacio´n de nidos por ratas. En el presente estudio hemos evaluado la fabricacio´n de nidos en ratas en dos instalaciones diferentes tratando las hipo´tesis de que el proveedor, la experiencia previa con el material de nidos y el sexo de las ratas tienen un impacto en el rendimiento. Los datos de dos lugares de estudio y tres clasificadores fueron cotejados para obtener informacio´n sobre la solidez de la calificacio´ndela complejidad de los nidos. Las conclusiones demuestran un mal rendimiento a la hora de fabricar nidos por ratas con una fluctuacio´n diaria pronunciada y diferencias segu´n el sitio en concreto. La aplicacio´ndeun nuevo sistema de clasificacio´n produjo una fiabilidad entre clasificadores intermedia. La experiencia precia con el material de nidos no tuvo un impacto consistente en la clasificacio´n de complejidad de nidos. Las diferencias de sexo demostraron depender del proveedor y la instalacio´n de animales sin conclusiones consistentes que respaldasen un rendimiento superior en ratas macho o hembra. En conclusio´n, nuestros datos niegan la existencia de una influencia so´lida y consistente de sexo y familiaridad con el material de nidos. La comparacio´n entre instalaciones sugiere que las condiciones locales deben tenerse en cuenta como factor influyente, que se deberı´a explorar en ma´s detalle mediante me´todos multicentro futuros. Considerando la fluctuacio´n diaria y la fiabilidad entre clasificadores intermedia, recomendamos encareci- damente basar la evaluacio´n de la complejidad de nidos en promedios de dı´as posteriores analizados por un grupo de clasificadores especializados. Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 26–32 ! The Author(s) 2019 Measurement of corticosterone in mice: Article reuse guidelines: sagepub.com/journals-permissions a protocol for a mapping review DOI: 10.1177/0023677219868499 journals.sagepub.com/home/lan

Cathalijn H.C. Leenaars1,2 , Stevie van der Mierden1, Mattea Durst3, Vivian C Goerlich-Jansson2, Florenza Lu¨der Ripoli1, Lydia M Keubler1 , Steven R Talbot1 , Erin Boyle1, Anne Habedank4, Paulin Jirkof3 , Lars Lewejohann4,5, Peter Gass6 , Rene´ Tolba7 and Andre´ Bleich1

Abstract Severity assessment for experiments conducted with laboratory animals is still based mainly on subjective evaluations; evidence-based methods are scarce. Objective measures, amongst which determination of the concentrations of stress hormones, can be used to aid severity assessment. Short-term increases in gluco- corticoid concentrations generally reflect healthy responses to stressors, but prolonged increases may indicate impaired welfare. As mice are the most commonly used laboratory animal species, we performed a systematic mapping review of corticosterone measurements in Mus musculus, to provide a full overview of specimen types (e.g. blood, urine, hair, saliva, and milk) and analysis techniques. In this publication, we share our protocol and search strategy, and our rationale for performing this systematic analysis to advance severity assessment. So far, we have screened 13,520 references, and included 5337 on primary studies with measurements of endogenous corticosterone in M. musculus. Data extraction is currently in progress. When finished, this mapping review will be a valuable resource for scientists interested in corticosterone measurements to aid severity assessment. We plan to present the data in a publication and a searchable database, which will allow for even easier retrieval of the relevant literature. These resources will aid implementation of objective measures into severity assessment.

Keywords corticosterone, mice, mapping review, systematic review, specimen, detection

Date received: 27 February 2019; accepted: 15 July 2019

Severity assessment is an integral part of all animal ‘‘non-recovery’’, ‘‘mild’’, ‘‘moderate’’ or ‘‘severe’’. experiments performed within the European Union, While the directive and supplementary materials pro- as prescribed by article 15(1) from directive 2010/63/ vide a basis to estimate severity, the actual assessment EU. The assessed severity can be categorised as process is still based mainly on subjective evaluations by individual scientists; evidence-based methods to 1Institute for Laboratory Animal Science, Hannover Medical grade severity are scarce.1–4 School, Germany Several objective measures can be considered to aid 2Department of Animals in Science and Society, Utrecht University, The Netherlands severity assessment. These measures comprise three 3Division of Surgical Research, University Hospital Zurich,

University of Zurich, Switzerland 7 4German Centre for the Protection of Laboratory Animals (Bf3R), Institute for Laboratory Animal Science and Experimental German Federal Institute for Risk Assessment, Berlin, Germany Surgery, RWTH Aachen University, Germany 5Institute of Animal Welfare, Animal Behaviour and Laboratory Corresponding author: Animal Science, Free University Berlin, Germany Andre´ Bleich, Hannover Medical School, Carl-Neuberg-Straße 1, 6Department of Psychiatry and Psychotherapy, Central Institute of Hannover, Lower-Saxony, Germany. Mental Health, University of Heidelberg, Mannheim Faculty, Germany Email: [email protected] Leenaars et al. 27 main categories: behaviour (spontaneous behaviours of review explicitly describing the review methods is such as vocalisations and locomotion, nesting and the mapping review.25 Mapping reviews comprise a burrowing, but also choices in preference tests),5 physi- comprehensive search of a wide field and present their ology (e.g. heart rate, respiration, body weight) and results in a user-friendly format.26,27 The objective of a biochemistry (stress hormones).6–8 Biochemical meas- mapping review is wider and more descriptive than ures do not seem to be the most popular for severity answering a specific research question, as is common assessment, possibly because several can only be made in systematic reviews. Data extraction is thus limited, post-mortem. However, hormones that fluctuate with and assessment of the risk of bias in the included stu- experienced stress, e.g. glucocorticoids, can be mea- dies is optional. A mapping review will identify evi- sured in vivo to provide an indirect indication of the dence clusters and evidence gaps. Systematic mapping actual severity experienced by the animal, and these of animal studies has started only recently.28 While sev- measurements are not necessarily invasive. eral important narrative reviews on corticosterone in While the glucocorticoid stress response usually mice have recently been published,29,30 no systematic reflects a healthy reaction to stressors, the prolonged overview of all available work on corticosterone in activation of the hypothalamic-pituitary-adrenal-axis mice is available to date. generally is a reason for concern.8 Therefore, measuring It is becoming more common to publish review proto- glucocorticoids can be used to evaluate animal welfare, cols for animal studies (refer to Pires et al.31 as an exam- and thereby severity.9–11 It is still common to sample ple). Protocols are published to benefit from peer review corticosteroids from blood.12,13 For sampling blood, to optimise the review process, to make comprehensive the number of repeated measurements from one search strings available to other reviewers and to share animal is limited by the maximum volume of blood information on the ongoing effort with the scientific that can be withdrawn. Besides, most blood sampling community before publication of the results. In line procedures are invasive. Microdialysis allows for with this practice, we present the protocol of our system- repeated measurements over several days, on a time atic mapping review on corticosterone in mice in this scale of minutes to hours.14–16 However, it is technically publication. We have thus far completed the screening challenging, and while the measurements themselves phase for inclusion of the relevant references and are in are not invasive, the animal needs to undergo surgery the midst of data extraction. to implant a probe or guide canula. Particularly for welfare-related studies, refined methods for specimens Protocol/methods that can be collected less invasively, such as urine, hair, saliva and milk are preferred.12,13,17–19 For transparent research practices, a non-narrative ver- Corticosterone can be measured using several tech- sion of this protocol was posted online before we started niques, e.g. gas chromatography/mass spectrometry, screening the literature on the Open Science Framework high-performance liquid chromatography and several (www.osf.io; 23 February 2018). To increase the chances types of immunoassays. While the chromatography- of the protocol being found by those interested, it was based techniques are technically challenging, they are furthermore posted on the Systematic Review Facility the most specific. Immunoassays are based on anti- (http://syrf.org.uk/; 14 January 2019). body-binding and are therefore generally less specific. However, they are easier to perform and can provide Research question useful results depending on the species and specimen type under investigation.20,21 Because this is a mapping review it does not follow the Mice are the main species used in laboratory animal standard PICO-format for the research question sciences; in 2011, 61% of all animals used for scientific (Population, Intervention, Comparison, Outcome). purposes within the European Union (EU) were mice.22 We defined only the population (Mus musculus) and In 2017, in Germany the percentage was 66% (www. outcome (endogenous corticosterone concentration). bmel.de, ‘‘Tierschutz in der Forschung’’, accessed 31 This mapping review thus gathers all literature on cor- January 2019). Therefore, objective measures to aid ticosterone measurements in mice indexed up to 7 severity assessment in mice are imperative. In mice, cor- February 2018. Besides, it will answer two main questions. ticosterone is the main glucocorticoid.23,24 With the number of new scientific publications 1. Which specimen types and methods of detection increasing daily, the need for objective literature have been used for corticosterone measurement in reviews rises. Unfortunately, most reviews still do not mice?; and implement an explicit methodology, resulting in poten- 2. In which fields of research (animal welfare-, inflam- tial bias of the presented results that cannot be esti- mation-, neuroscience-, pain-, and stress-research) mated by the scientists reading the review. One type have these measurements been performed?. 28 Laboratory Animals 54(1) Search strategy References in this latter category were ordered via the Two literature databases were searched on 7 February library of the Hannover Medical School for further 2018: PubMed and Embase. Both are comprehensive analysis. databases that have indexed all included references All references were screened independently by at using an internal thesaurus (MeSH for PubMed, least two reviewers. In case of discrepancies the Emtree for Embase). We used the MeSH and Emtree reviewers reread and discussed the reference until con- terms besides searching for author-defined keywords sensus had been reached. and title, and abstract text words to retrieve all relevant references. Inclusion and exclusion criteria The search strings consist of the two elements mice and corticosterone, which were combined with The following three inclusion criteria were used: ‘‘AND’’. The search string comprised the appropriate first, the reference had to describe a primary study; index terms and an extensive set of synonyms, alterna- second, the study had to be done in the house mouse tive spellings and related terms. The full search strategy (M. musculus), and third, the study had to measure is provided in Table 1. endogenous corticosterone. Thus, non-primary studies, reviews without new data, studies that did not measure Study selection corticosterone, and studies that measured cortico- sterone only after exogenous administration were To optimize the work flow, a non-standard approach excluded. There were no restrictions on publication for reference screening in a single phase was used. date or language of the references. Inclusion or exclusion was based only on the title and abstract. If this information was not sufficient Data extraction to determine if the inclusion criteria had been met, the full text was immediately consulted. If the The data to be extracted were subdivided into three full text could not be retrieved online, the reference domains: bibliographic data, animal model characteris- received the label ‘‘To be determined on full text’’. tics, and outcome measures. We extract the following

Table 1. Search strategies.

Search string Database element Search string

Pubmed Mice (mice[mesh] OR mice[tiab] OR mus[tiab] OR mouse[tiab] OR murine[tiab]) Pubmed Corticosterone (corticosterone[mesh] OR corticosterone[tiab] OR corticocorticosterone [tiab] OR corticocosteroid[tiab] OR corticorterone[tiab] OR corticoserone[tiab] OR corticostcrone[tiab] OR corticosteone[tiab] OR corticoster[tiab] OR corticosterene[tiab] OR corticostereone[tiab] OR corticosteron[tiab] OR cortikosterone[tiab] OR cortikosteron[tiab] OR kortikosterone[tiab] OR kortikosteron[tiab] OR ‘kendall compound b’[tiab] OR ‘reichstein substance h’[tiab] OR corticoesterone[tiab] OR cortecosterone[tiab] OR corticossterone[tiab] OR cortcosterone[tiab]) Embase Mice (‘mouse’/exp OR mouse:ab,kw,ti OR mice:ab,kw,ti OR mus:ab,kw,ti OR musculus:ab,kw,ti OR murine:ab,kw,ti) Embase Corticosterone (‘corticosterone’/de OR ‘corticosterone blood level’/de OR ‘cortico- sterone release’/de OR ‘corticosterone’:ab,kw,ti OR ‘corticocorticosterone’:ab,kw,ti OR ‘corticocosteroid’:ab,kw,ti OR ‘corticorterone’:ab,kw,ti OR ‘corticoserone’:ab,kw,ti OR ‘corticostcrone’:ab,kw,ti OR ‘corticosteone’:ab,kw,ti OR ‘corticoster’:ab,kw,ti OR ‘corticosterene’:ab,kw,ti OR ‘corticostereone’:ab,kw,ti OR ‘corticosteron’:ab,kw,ti OR ‘cortikosterone’:ab,kw,ti OR ‘cortikosteron’:ab,kw,ti OR ‘kortikosterone’:ab,kw,ti OR ‘kortikosteron’:ab,kw,ti OR ‘kendall compound b’:ab,kw,ti OR ‘reichstein substance h’:ab,kw,ti OR ‘cortcosterone’:ab,kw,ti OR ‘corticoesterone’:ab,kw,ti OR ‘corticostron’:ab,kw,ti OR ‘cortocosterone’:ab,kw,ti) Leenaars et al. 29 bibliographic data to identify references: authors, year of publication, title, journal, issue, page number, and PubMed Embase language. The animal model characteristics we extract N = 6.046 N = 7.474 are: mouse strain, sex, and whether the mice were used Total References as an animal model for a specific human disease. Retrieved The outcome measures to be analysed in the N = 13.520 mapping are: specimen type (i.e. specimen wherein cor- Duplicates ticosterone was measured, e.g. serum), quantifica- N = 5.445 tion technique (e.g. radio-immunoassay), and whether the study was related to animal welfare, inflammation, Total Unique neuroscience, pain, or stress. It is possible for a References study to relate to multiple or none of the research N = 8.075 Excluded fields of interest (e.g. a study analysing the effect (Not Relevant) of repeated mild stress on neurogenesis in the hippo- N = 2.738 campus would qualify as related to neuroscience Total Studies and stress). Included For data extraction, the references are distributed N = 5.337 amongst the reviewers. Data from each reference is extracted by a single reviewer. For quality control, at Figure 1. Flow of references. least 5% of all references are randomly selected and checked by a second reviewer for errors and inconsistencies. All data are being extracted using a standardized mid-2019. Around halfway through the data extraction, sheet in Excel. To prevent variability between reviewers, 4.4% of the included references were on an animal wel- pre-specified lists of answers are used where possible. fare-related topics. For example, for sex the options are: M (male), F Besides crude numbers and percentages, we will ana- (female), B (both), or U (unknown or not given). lyse changes in research practices and research fields over time based on publication dates. For example, Data synthesis and risk of bias analysis the methods used for corticosterone measurements are expected to change over time, with radio-immunoas- The extracted data will be tabulated and summarised in says being replaced by other immunoassays and high- figures. These will show frequency of mouse strains performance liquid chromatography. Furthermore, we used, sex, corticosterone quantification method, etc. plan to make the data from the 5337 included refer- For this mapping review, no quantitative outcome ences available in a searchable database. values (e.g. concentrations) will be extracted, thus no meta-analyses will be performed. Discussion Due to the scope of this mapping review, a full risk of bias assessment is not viable. To provide a rough This mapping review will be an accessible resource for indication of the reporting quality of the included scientists interested in corticosterone measurements in studies, reporting frequencies will be analysed for mice. The publication will show which techniques have mouse strain, sex, specimen type and quantification been used to measure corticosterone in different speci- technique. men types over time. The database will allow scientists to easily retrieve the relevant literature as a background Preliminary results for their experiments. Both resources will aid imple- mentation of objective measures into severity assess- Our search in Pubmed retrieved 6046 references, that in ment in at least three manners. First, future study Embase 7474. After duplicate removal, 8075 references planning will benefit from improved estimation of the were imported into EROS (Early Review Organising value and limitations of integrating corticosterone Software), a web-based application, for screening. For measurements. Second, the collated evidence will aid 3382 references, the full-text had to be consulted to elucidating the biological meaning of corticosterone decide on inclusion. We included 5337 references into concentrations for severity assessment. Third, selection our mapping review. The flow of references is provided of the most appropriate animal model will benefit from in Figure 1. knowing the relevance of corticosterone measurement. Data extraction from these 5337 references is cur- A limitation of our database will be that our map- rently in progress. We anticipate analysing the results ping review was restricted to corticosterone itself. 30 Laboratory Animals 54(1)

We excluded the corticosterone metabolites from our a severity assessment framework for laboratory animals. mapping review, because preliminary searches for the Lab Anim 2017; 51: 667. metabolites indicated that the amount of literature 2. Bodden C, Siestrup S, Palme R, et al. Evidence-based retrieved would become unmanageable. As most rele- severity assessment: Impact of repeated versus single vant papers on measurements of corticosterone metab- open-field testing on welfare in C57BL/6J mice. Behav olites will mention the word ‘‘corticosterone’’ in the Brain Res 2018; 336: 261–268. 3. Hager C, Keubler LM, Talbot SR, et al. Running in title, abstract or keywords, they will have been retrieved the wheel: Defining individual severity levels in mice. by our search. During the screening phase, we added PLoS Biol 2018; 16: e2006159. labels to the papers stating that they measured cortico- 4. Keubler LM, Tolba RH, Bleich A, et al. Severity sterone metabolites. We plan a separate review of these assessment in laboratory animals: a short overview on papers (protocol under development). potentially applicable parameters. Berl Munch Tierarztl A general limitation of mapping reviews is that the Wochenschr 2018; 131: 299–303. amount of data extracted, and therefore the conclu- 5. Habedank A, Kahnau P, Diederich K, et al. Severity sions that can be drawn based on them, is limited.25–27 assessment from an animal’s point of viewBerl Munch This is inevitable to keep the mapping review process Tierarztl Wochenschr 2018; 1–17. manageable. Our mapping review is, however, an excel- 6. Baumans V. Science-based assessment of animal welfare: lent starting point for further in-depth reviews, as all laboratory animals. Rev Sci Tech 2005; 24: 503–513. the relevant literature up to 7 February 2018 will 7. Morton DB and Griffiths PH. Guidelines on the recog- already be gathered. As corticosterone is the most nition of pain, distress and discomfort in experimental animals and an hypothesis for assessment. Vet Rec common stress hormone analysed in relation to severity 1985; 116: 431–436. assessment, our comprehensive analysis of the relevant 8. Bassett L and Buchanan-Smith HM. Effects of predict- literature, and its accessibility in a database, will benefit ability on the welfare of captive animals. Appl Anim and support the implementation of more objective Behav Sci 2007; 102: 223–245. severity assessment strategies. 9. Cockrem JF. Individual variation in glucocorticoid stress responses in animals. Gen Comp Endocrinol 2013; 181: Acknowledgements 45–58. 10. Mormede P, Andanson S, Auperin B, et al. Exploration We would like to thank Alice Tillema for help in search devel- of the hypothalamic-pituitary-adrenal function as a tool opment, and Rosalie Kempkens and Bobbie Smith for their to evaluate animal welfare. Physiol Behav 2007; 92: ongoing help in extracting data. 317–339. 11. Ralph CR and Tilbrook AJ. INVITED REVIEW: Declaration of Conflicting Interests The usefulness of measuring glucocorticoids for assessing The author(s) declared no potential conflicts of interest with animal welfare. J Anim Sci 2016; 94: 457–470. respect to the research, authorship, and/or publication of this 12. Yu T, Xu H, Wang W, et al. Determination of endogen- article. ous corticosterone in rodent’s blood, brain and hair with LC-APCI-MS/MS. J Chromatogr B Analyt Technol Biomed Life Sci 2015; 1002: 267–276. Funding 13. Thorpe JB, Gould KE, Borman ED, et al. Circulating The author(s) disclosed receipt of the following financial sup- and urinary adrenal corticosterone, progesterone, and port for the research, authorship and/or publication of this estradiol in response to acute stress in female mice article: This work is funded by the DFG (FOR2591, BL 953/ (Mus musculus). Horm Metab Res 2014; 46: 211–218. 11-1). 14. Engeland WC, Massman L, Mishra S, et al. The adrenal clock prevents aberrant light-induced alterations in circa- ORCID iDs dian glucocorticoid rhythms. Endocrinology 2018; 159: 3950–3964. Cathalijn H.C. Leenaars https://orcid.org/0000-0002- 15. Leenaars CH, Dematteis M, Joosten RN, et al. A new 8212-7632 automated method for rat sleep deprivation with minimal Lydia M Keubler https://orcid.org/0000-0002-8738-9877 confounding effects on corticosterone and locomotor Steven R Talbot https://orcid.org/0000-0002-9062-4065 activity. J Neurosci Methods 2011; 196: 107–117. Paulin Jirkof https://orcid.org/0000-0002-7225-2325 16. Pierard C, Dorey R, Henkous N, et al. Different impli- Peter Gass https://orcid.org/0000-0003-3959-6369 Rene´Tolba https://orcid.org/0000-0002-0383-3994 cations of the dorsal and ventral hippocampus on con- Andre´Bleich https://orcid.org/0000-0002-3438-0254 textual memory retrieval after stress. Hippocampus 2017; 27: 999–1015. 17. Hohlbaum K, Bert B, Dietze S, et al. Severity classifica- References tion of repeated isoflurane anesthesia in C57BL/6JRj 1. Bleich A and Tolba RH. How can we assess their mice – assessing the degree of distress. PLoS One 2017; ? German research consortium aims at defining 12: e0179588. Leenaars et al. 31

18. Nohara M, Tohei A, Sato T, et al. Evaluation of response 26. Miake-Lye IM, Hempel S, Shanman R, et al. What is an to restraint stress by salivary corticosterone levels in adult evidence map? A systematic review of published evidence male mice. J Vet Med Sci 2016; 78: 775–780. maps and their definitions, methods, and products. Syst 19. Yeh KY. Corticosterone concentrations in the serum and Rev 2016; 5: 28. milk of lactating rats: parallel changes after induced 27. Snilstveit B, Vojtkova M, Bhavsar A, et al. Evidence & stress. Endocrinology 1984; 115: 1364–1370. Gap Maps: A tool for promoting evidence informed 20. Cook NJ. Minimally invasive sampling media and the policy and strategic research agendas. J Clin Epidemiol measurement of corticosteroids as biomarkers of stress 2016; 79: 120–129. in animals. Can J Anim Sci 2012; 92: 227–259. 28. Leenaars CHC, Freymann J, Jakobs K, et al. A system- 21. Kinn Rod AM, Harkestad N, Jellestad FK, et al. atic search and mapping review of studies on intracereb- Comparison of commercial ELISA assays for quantifica- ral microdialysis of amino acids, and systematized review tion of corticosterone in serum. Sci Rep 2017; 7: 1–5. of studies on circadian rhythms. J Circadian Rhythms 22. Report from the commission to the council and the 2018; 16: 12. European parliament. Seventh Report on the Statistics 29. Johnson FK and Kaffman A. Early life stress perturbs on the Number of Animals used for Experimental and the function of microglia in the developing rodent brain: other Scientific Purposes in the Member States of the New insights and future challenges. Brain Behav Immun European Union. 2013. Brussels, https://eur-lex.euro- 2018; 69: 18–27. pa.eu/legal-content/EN/TXT/PDF/?uri=CELEX: 30. Taves MD, Hamden JE and Soma KK. Local gluco- 52013DC0859 (accessed 2 October 2019). corticoid production in lymphoid organs of mice and 23. Gong S, Miao YL, Jiao GZ, et al. Dynamics and correl- birds: functions in lymphocyte development. Horm ation of serum cortisol and corticosterone under different Behav 2017; 88: 4–14. physiological or stressful conditions in mice. PLoS One 31. Pires GN, Bezerra AG, de Vries RBM, et al. Effects of 2015; 10: e0117503. experimental sleep deprivation on aggressive, sexual and 24. Spackman DH and Riley V. Corticosterone concentra- maternal behaviour in animals: a systematic review tions in the mouse. Science 1978; 200: 87. protocol. BMJ Open Science 2018; 2: e000041. 25. Grant MJ and Booth A. A typology of reviews: an ana- lysis of 14 review types and associated methodologies. Health Info Libr J 2009; 26: 91–108.

Re´sume´ Les e´valuations de la gravite´ lie´es aux expe´riences mene´es sur des animaux de laboratoire restent princi- palement base´es sur des e´valuations subjectives; les me´thodes fonde´es sur les faits sont peu nombreuses. Des mesures objectives, parmi lesquelles la de´termination des concentrations d’hormones de stress, peuvent aider a` e´valuer la gravite´. Une bre`ve augmentation des concentrations de glucocorticoı¨des indique ge´ne´rale- ment une re´ponse saine aux facteurs de stress, tandis qu’une augmentation prolonge´e peut indiquer un bien- eˆtre compromis. Les souris e´tant les espe`ces animales les plus commune´ment utilise´es dans les laboratoires, nous avons mene´ un examen cartographique syste´matique des dosages de corticoste´rone chez les souris Mus musculus, afin de fournir un aperc¸ucomplet des types d’e´chantillons utilise´s (c.-a`-d. sang, urine, poil, salive et lait) et des techniques d’analyse. Dans cette publication, nous partageons notre protocole et notre strate´gie de recherche ainsi que nos raisons pour effectuer cette analyse syste´matique afin de faire avancer les e´valu- ations de la gravite´. A` ce jour, nous avons analyse´s 13 520 re´fe´rences, et inclus 5337 concernant des e´tudes primaires avec des dosages de la corticoste´rone endoge`ne chez les souris Mus musculus. L’extraction des donne´es est actuelle- ment en cours. Lorsque nous aurons termine´, cette analyse cartographique sera une ressource utile aux scientifiques inte´resse´s par le dosage de la corticoste´rone pour aider a` e´valuer la gravite´. Nous pre´voyons de pre´senter ces donne´es dans une publication ainsi que dans une base de donne´es consultable, ce qui permet- tra de trouver encore plus facilement la litte´rature pertinente. Ces ressources aideront a` mettre en place des mesures objectives pour e´valuer la gravite´.

Abstract Die Bewertung der Belastung fu¨r Versuchstiere basiert nach wie vor hauptsa¨chlich auf subjektiven Beurteilungen; evidenzbasierte Methoden sind selten. Objektive Messungen, unter anderem die Bestimmung der Konzentration von Stresshormonen, ko¨nnen zur Unterstu¨tzung der Belastungsbewertung herangezogen werden. Kurzfristige Erho¨hungen der Glukokortikoidkonzentrationen spiegeln im Allgemeinen 32 Laboratory Animals 54(1) gesunde Reaktionen auf Stressoren wider, aber anhaltende Erho¨hungen ko¨nnen auf ein beeintra¨chtigtes Wohlbefinden hinweisen. Da Ma¨use die am ha¨ufigsten verwendete Versuchstierart sind, fu¨hren wir eine systematische Mapping- Review der Kortikosteronmessungen in Mus musculus durch, um einen vollsta¨ndigen U¨berblick u¨ber Probenarten (z. B. Blut, Urin, Haare, Speichel und Milch) und Analysetechniken zu erhalten. In dieser Vero¨ffentlichung teilen wir unsere Protokoll- und Recherchestrategie sowie unsere Begru¨ndung fu¨r die Durchfu¨hrung dieser systematischen Analyse, um die Belastungsgradbewertung voranzutreiben. Bislang haben wir 13.520 Referenzen gepru¨ft und 5.337 in Prima¨rstudien mit Messungen von endogenem Corticosteron in Mus musculus aufgenommen. Die Datenextraktion ist derzeit im Gange. Diese Mapping- Review wird nach ihrem Abschluss eine wertvolle Ressource fu¨r Wissenschaftler sein, die sich fu¨r Kortikosteronmessungen zur Unterstu¨tzung der Belastungsbewertung interessieren. Wir planen, die Daten in einer Publikation und einer durchsuchbaren Datenbank zu pra¨sentieren, was den Zugriff auf die relevante Literatur noch einfacher machen du¨rfte. Diese Ressourcen werden der Einfu¨hrung objektiver Messungen fu¨r die Bewertung von Belastungsgraden dienen.

Resumen La evaluacio´n de gravedad de experimentos llevados a cabo con animales de laboratorio se basa principal- mente todavı´a en evaluaciones subjetivas; los me´todos basados en pruebas son escasos. Se pueden usar medidas objetivas, entre las que se encuentran la determinacio´n de las concentraciones de hormonas del estre´s, para ayudar a evaluar situaciones de gravedad. Unos aumentos a corto plazo en las concentraciones de glucocorticoides generalmente reflejan respuestas saludables a factores de estre´s, pero unos aumentos prolongados pueden indicar un deterioro del bienestar. Ya que los roedores son la especie animal ma´s utilizada en laboratorios, estamos realizando un estudio de revisio´n sistema´tico de las mediciones de corticosterona en Mus musculus, a fin de ofrecer una visio´n general de los tipos de especies (p. ej., sangre, orina, pelo, saliva y leche) y te´cnicas de ana´lisis. En esta publicacio´n, compartimos nuestra estrategia de investigacio´n y protocolo, ası´ como nuestro razonamiento para realizar este ana´lisis sistema´tico para mejorar la evaluacio´n de las situaciones de gravedad. Hasta la fecha, hemos evaluado 13.520 referencias que incluı´an 5.337 sobre estudios principales con mediciones de la corticosterona endo´gena en Mus musculus. La extraccio´n de datos esta´ actualmente en curso. Cuando finalice, este estudio de ana´lisis sera´ una fuente valiosa para los cientı´ficos interesados en las mediciones de la corticosterona para ayudar a evaluar las situaciones de gravedad. Tenemos pensado pre- sentar los datos en una publicacio´n y una base de dates que podra´ consultarse, lo cual permitira´ hacer bu´squedas ma´s sencillas de toda la informacio´n disponible. Estos recursos ayudara´n en la implementacio´n de medidas objetivas sobre la evaluacio´n de situaciones de gravedad. Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 33–39 ! The Author(s) 2019 Design of a joint research data Article reuse guidelines: sagepub.com/journals-permissions platform: A use case for DOI: 10.1177/0023677219872228 journals.sagepub.com/home/lan severity assessment

Steven R Talbot1 , Stefan Bruch2 , Fabian Kießling3,4,5, Michael Marschollek6, Branko Jandric1, Rene´ H Tolba2 and Andre´ Bleich1

Abstract Severity assessment in animal models is a data-driven process. We therefore present a use case for building a repository for interlaboratory collaboration with the potential of uploading specific content, making group announcements and internal prepublication discussions. We clearly show that it is possible to offer such a structure with minimal effort and a basic understanding of web-based services, also taking into account the human factor in individual data collection. The FOR2591 Online Repository serves as a blueprint for other groups, so that one day not only will data sharing among consortium members be improved but the transition from the private to the persistent domain will also be easier.

Keywords repository, severity assessment, data collection, Small Data, Big Data

Date received: 23 February 2019; accepted: 24 July 2019

Monitoring the well-being of animals and aberrance but can lead to general models, for example describing from it requires evidence-based multimodal techniques.1 severity levels.2 The key issue here is data quality. In the quest of searching for optimal indicators and vari- On a formal level, acquiring and evaluating experi- ables, empirical methods are indispensable. As such, the mental data are at the core of every empirical science. combination of multivariate data for an unbiased assess- Therefore, a careful, structured and efficient handling of ment of well-being and severity categorisation requires data stocks is crucial to gain new findings. In particular, advanced knowledge on statistical modelling. In this data conservation is a key issue of safeguarding good context, ‘Big Data’, ‘data mining’, ‘machine learning’ (ML) and ‘artificial intelligence’ (AI) are keywords 1Institute for Laboratory Animal Science, Hannover Medical researchers are often confronted with. Finding defin- School, Germany itions for these fields is still a matter of ongoing research. 2Institute for Laboratory Animal Science and Experimental Surgery In our opinion, however, AI is an unspecific word for and Central Laboratory for Laboratory Animal Science, RWTH largely automated (adaptive) computational problem Aachen University, Germany 3Institute for Experimental Molecular Imaging, RWTH Aachen solving, while ML takes a more direct approach, usually University, Germany by finding optimal feature representations followed by 4Helmholtz-Institute for Biomedical Engineering, RWTH Aachen classification/regression analysis. Whereas data mining University, Germany 5Fraunhofer MEVIS: Institute for Digital Medicine, Germany applies algorithmic approaches to large data sets, Big 6 Data is not innately connected to the prior keywords. Peter L. Reichertz Institute for Medical Informatics, Hannover Medical School, Germany Usually, the latter is largely defined by the five Vs Corresponding author: (volume, variety, velocity, validity and value) and has Steven R Talbot, Institute for Laboratory Animal Science Carl- no analytical qualities per se. These methods not only Neuberg-Str. 1 Hannover, Lower Saxony 30625 Germany. allow directed and on-point analyses of single animals Email: [email protected] 34 Laboratory Animals 54(1) scientific practice to ensure reproducibility of results at in clinical research or own developments failed due to any time.3 Standardised data handling is realised by high usage complexity and requirements mismatch. means of a data management plan implementing the So an intermediate solution is based on a repository Curation Domain Model that splits the data life cycle containing data sets in files. The file format and the into four phases called ‘domains’.4 These are: entire system is refined stepwise to prepare a final future database solution for standardised data. 1. Private domain. This is the phase where research groups perform their experiments, collect primary Repository framework data and decide whether it is worth further analysis at the group level. The online repository FOR (FOR2591 Online 2. Group domain. This domain serves as data pool to Repository) runs as a sub-domain on https://for.sever- collect data and share it with groups working on ity-assessment.de. Sites will subsequently be shown similar issues. as sub-paths in the format ‘./path’. On the ‘for’ 3. Persistent domain: Here, a pre-selection and prepar- sub-domain, a WordPress framework (current version ation for long-term conservation is performed. 4.9.8) was installed followed by the Twenty Sixteen 4. Access domain: The final stage of long-term data theme. Within WordPress, the following plugins are curation. used: Contact Form 7, Embed Any Document, Password Protected, TinyMCE Advanced, VFB Pro Private domains are under researchers’ responsibility, and Wordfence Security. A basic tutorial can be whereas long-term storage solutions require final defin- found at https://learn.wordpress.com/.6 itions on data structures. So, the urgent task is to develop a repository structure to operate the group domain for General structure severity assessment in animal models. In contrast to a conventional linear life cycle, this domain design allows The basic structure of the repository is that of a classic for data to be reused at a very early stage nearly parallel blog. This means that there are sites and posts within a to its primary usage. Collaborative work on a common content management system (CMS). Depending on data stock offers a large variety of additional analytical relative domain state, data can be preliminary, sensitive capabilities. Furthermore, the presence of, compared to or due to be published. So, data access is restricted to a usual scopes in animal science, large amounts of data need-to-know basis. may avoid redundant experiments and raises hopes of Static sites were used in structuring the repository, applying methods of computational KDD (Knowledge while posts contain the individual experimental infor- Discovery in Databases) in addition to classic mation. Experimental data are linked and displayed as meta-analyses. individual files to the respective post and are bundled In this report, we suggest an easy-to-use platform as project-wise as zip files on the general ‘./repository’ site. a use case for data sharing, information distribution This flexible framework offers many advantages for and knowledge pooling, making it available to a limited project partners, since it allows easy publication and group of peers. In order to gain best value from scien- data sharing while maintaining readability and struc- tific data, especially when animals are involved, strate- ture. In addition, it limits the required level of database gies to combine multiple data sets need to be laid out.5 proficiency for the actual users. The comment function Therefore, we will discuss some minor and major allows easy information sharing and discussion. This aspects of the development process of a repository for can be used rather extensively if required. use in severity assessment and will focus on our experi- The repository starts with a splash screen giving an ences concerning compliance and interlaboratory data overview of the different options on the site. Users can harmonisation. decide whether they want to ‘visit the data repository and download raw data’, ‘learn about the individual projects’, ‘read more about FOR’, ‘download FORm Use case: Repository organisation for uploading data’ or ‘contact the admin’. This meta- Objective and preliminary work structure is also accessible via the main menu at the top of the website (Figure 1). The menu structure and To harmonise qualitative scales and measurement pro- definition have to be defined within the CMS. If the cedures for severity assessment in different scenarios, repository becomes more complex, the overall structure it is crucial to collect and merge data from experiments of the menu can easily be adapted. of different design and methodology. The solution of In a widget section, actual news is shown (i.e. if there choice would be a front-end controlled database is an active data collection call, a new publication, a system. Approaches to draw on solutions established new meeting, etc.). Any updates to the data collection Talbot et al. 35

project names. This has proven to be more understand- able by users from different backgrounds. Final files are uploaded by the admin via file transfer protocol (ftp) to the server and hyperlinked to the meta-information of the respective study.

Use case: Data collection

Figure 1. General structure of the FOR repository, with In order to harmonise the data-collection process, a links to a separated upload/information site and specific form (FORm) was developed in several steps. FORm project folders. Each research item has a connection to the in its actual version (v1.1.1) is provided with the actual repository where all files can be downloaded as a supplemental material of this report or can be down- bundle. loaded from the repository. All project partners were required to organise their data in this MS Excel spead- sheet (FORm) and so on are also shown here. Further, sheet. The file consists of several sub-tabs to organise the user has the option to use a search field to find posts the data-collection process further. The first tab or data of interest. In the lower section of the widget (‘readme’) contains concise information on how to area, the last top five posts/projects are listed, followed format data in the query sheets. It is followed by a by a list of keywords together with the corresponding tab with general information on the study (who, how, topic frequencies. where, etc.) and was finally upgraded with a section containing an excerpt for the ARRIVE guidelines. In Data repository organisation addition, information on ethical statements, severity grading, funding and conflicts of interests are covered Data organisation in the repository is twofold. First, all so that each upload has sufficient metadata and anno- uploaded data are stored as *.txt and are bundled tations available. in *.zip files under ‘./repository’. Individual files can The next tabs are reserved for data entry. Each vari- be downloaded directly by clicking on the provided able is stored in a separate tab, and the structure links. For later organisation purposes, a tag is provided is formatted in the long format (Table 1). Animals are which makes it easier to find project-specific experi- coded in rows, and variables are coded in columns. ments when the repository becomes larger. In addition, Each row also contains an atomic entry called ‘index’. there is a search function for finding distinct keywords Then, a column with information on ‘treatment’ is set such as work groups, experimental topics and param- before each measurement. Experimental time points are eters using the metadata of the included data sets. A list ordered as multiple row entries per animal, and the time of categories in the widget area can be used to narrow component is encoded as a variable. down the search for keywords. Further, a second link FORm also contains a section for explanatory plots connects the data with the actual project description. and analyses performed by the actual project partner, Here, each study is described according to the entries n as well as a comment section. In case information in the the ‘information tab’ of the current version of FORm. data tabs is not sufficient, a sample information file Below the actual project description, the corresponding (SIF) can be provided as a separate tab. Here, add- data can be downloaded directly as *.txt files. itional variables such as sex, age or further treatments can be entered. The FOR2591 identification number Data upload serves as a common terminology in all data items. At the time of publication, FOR contains 14 data sets Data upload to the repository is a managed process. from studies originating from various groups of the The repository’s admin reviews and curates any pro- consortium. As the focus of the first survey lies on vided data before bringing it to the website by harmo- assessing the impact of severity on body weight, the nising headers in terms of variable names and time current structure contains 215,878 singular items on formats, cleaning tables from comments and restructur- this specific matter. ing potential tabular misfits. The ‘./data-upload’ section therefore contains information on how to structure and Discussion upload data in general. Special care is taken in naming the uploaded files systematically. The naming pattern A large data pool gives the opportunity to develop new was set to ‘workgroup_project_subproject_year_ KDD strategies. Maybe it can even serve as a blueprint month_day.xlsx’. However, in the repository, this for future Big Data solutions. The Big Data hype with naming was not used. Post IDs were set to individual its promises of better insights into all sorts of analytical 36 Laboratory Animals 54(1)

Table 1. Example for a data line-up in long format. which make it arduous to establish unified data-storage Animals have multiple rows, one for each time point. solutions for larger projects and consortiums, as the Variables are stored column-wise for each treatment. specific needs are highly individual. Cloud-based com- Index Treatment Time BW mercial solutions such as labfolder exist, but they offer limited control over content (management) and reposi- NW001 restraint_stress Baseline 22.6 tory structure and may require additional fees.9 NW001 restraint_stress 0 22.5 A joint research repository is a first step in the NW001 restraint_stress 1 21.8 development of a regular database for experimental NW001 restraint_stress 2 22.3 data. However, an actual database requires know- NW001 restraint_stress 3 22.4 ledge not only on building the actual framework but NW002 restraint_stress Baseline 21.0 also on query design and getting the most out of it. Furthermore, there is no standardised method of NW002 restraint_stress 0 20.6 communicating information between scientists or data NW002 restraint_stress 1 20.4 sets. Therefore, a blog-like structure offers the optimal NW002 restraint_stress 2 20.2 solution in covering both sharing data and allowing NW002 restraint_stress 3 20.3 communication among peers. The format is also well NW003 restraint_stress Baseline 22.2 suited for sharing the uploaded data, since only a lim- NW003 restraint_stress 0 22.0 ited set of variables are collected. There is simply BW: Body Weight. no need to build an oversized relational database for the given complexity of data. In addition, during the whole process, the aim was to learn as much as possible about the data-collection process and the dynamics of a questions is long over. As it became somewhat of a scientific consortium. daily reality, the possibilities of Big Data were accom- One additional benefit of a CMS is its relative ease of panied by evolutions in the fields of machine and use. Most users are intuitively familiar with blog naviga- deep learning, which now offer not only interesting tion, since many of us use the Internet every day. For applications for our daily lives, but also new insights most people, it is a fact of life to retrieve information for science itself. from websites such as FOR. This is a major advantage However, specifically in animal sciences, with compared to regular databases. Furthermore, it is a few exceptions, we have not even arrived at the stage of great platform for sharing news among project partners. Big Data. The reasons for this are manifold. One major factor is certainly that storage solutions are usually lim- Challenges and disadvantages ited to ‘in-house’ applications (private domain) and are only shared after publication (group/persistent domain). Of course, there are also some disadvantages in using a When publication goals are achieved, the interest in con- CMS. First, there needs to be an administrator to run tributing data to larger databases is meagre, since this and update the system. This requires some training in appears to offer no further benefit, as was exemplarily CMS (e.g. WordPress, Joomla, and others). Some shown in a study by Campbell et al.7 The paradigm experience with servers and ftp protocols is also cer- of publishing data only once puts even more pressure tainly helpful for uploading files and changing php on this system. and CSS files. Setting up an environment such as Furthermore, there are no easy-to-understand and FOR requires a hosting process, and the risk of getting thorough concepts on how to analyse and rate experi- hacked is substantial for popular CMS. mental data on a large scale – let alone giving the data Managing the content is another obstacle. Data pro- translational meaning. This includes the quest for vided by project partners are usually very variable. ‘the best parameter’ indicating potential severity in ani- Even if different labs measure the same outcomes, mals. Although meta-analyses tools and statistical con- they will invent unpredictable ways of data collection cepts are available, it still remains rather unclear how and interpretation, even though standard operating these relate to daily routines and animal well-being. procedures (SOPs) are implemented and in spite of Nevertheless, one promising approach lies in the emer- FORm’s attempt to provide a mutual standard for ging importance and use of systematic reviews and risk data collection. In our experience, occurring variance analysis. These techniques give insights into animal in data collection still has to be harmonised manually experiments by using already published data and stat- by the admin and requires an active communication istics, eventually allowing a refinement, reduction or role, even though people have agreed to use standar- replacement of animals (3R principle by Russel and dised procedures. Scientists will have to be called, Burch8). In reality, many different obstacles remain emailed or asked to change and adapt the provided Talbot et al. 37 data until everything is unified. This has proven to be Furthermore, there are some aspects which cannot one of the more work-intense steps in building the reposi- be solved by data science. Experiments often differ in tory. This is particularly true for data deriving from non- individual set-up parameters such as different time invasive imaging, where raw data are often difficult points, fundamental experimental procedures and con- to export and read. Even the reconstruction and storage ditions. Normally, this is regulated by SOPs which of volumetric image data generate a high device manu- every consortium member was obliged to follow and facturer-dependent heterogeneity. Furthermore, imaging which can be found on the internal section of https:// data consume significant storage space. Thus, so far, only severity-assessment.de. However, even if there were selected images or the quantitative results of image ana- attempts at standardisation, variation remained. This lyses are transferred into the FOR repository, and the is not easy to document because some of these vari- raw and volumetric data are stored on separate servers ations are not apparently visible in the raw data. at our universities. In this context, we plan to integrate a These potential experimental confounders are an inher- new file format specifically designed to store longitudinal ent weakness of every data-collection process. Special multimodal image data. This standardised and open-file care should always be taken by all consortium members format includes a cryptographic hash to ensure data and admin not only in controlling data quality itself, integrity in a provable and legally accepted way, provides but also in gaining knowledge on how data were gen- fast and efficient image compression, and was developed erated in the first place. To conquer general bias in data in cooperation with several small-animal imaging hard- collection, FORm was invented and curated. It also ware manufacturers and image-analysis software offers a flexible way of adjusting to individual needs if, companies.10 for example, a member requires fields for additional par- A data dictionary usually helps with the definition ameters. Over the whole data-collection period, FORm of variables so that the collection process is unified. was continuously improved. Changes were made public In FOR, the data dictionary was introduced rather using the information widgets in the repository. late in the whole process. Therefore, it is highly recom- mended to define variables right at the beginning of a Pitfalls in data collection project so that format, type and scale of variables are transparent. But even if such a dictionary is available, One major factor in data collection for severity assess- users will still be very ingenious at misinterpreting its ment is humans. Ease of use is a central topic to lower meaning. A practical approach for structuring larger inhibition thresholds to join collaborative work. data projects was described by Wulf et al.11 A system- Managing compliance is an issue in the data-collection ised nomenclature such as SNOMED-CT12 may also process. It is essential to establish clear and concise help in structuring systematic data formats. communication protocols for the process as well as Maintaining data quality over time is important deadlines. FOR gave ample possibilities to study these as well. Data degrade over time, since new or supple- effects, which can hardly be generalised because they mental data give new insights. Another important are highly individual. Simplifying the procedures factor in data degradation is insufficient documentation during the whole process will benefit all members. and avoidance of commonly accepted community Until then, we will also continue to approach standards for data collection. Therefore, it cannot KDD and data mining in further publications. The be stressed enough that rigorous standardisation and quality and quantity of collected data are sufficient to harmonisation rules must be adhered to by all project develop and validate algorithms able to perform stan- partners. Further, the collection of sufficient metadata dardised and automated analyses on data sets. We are is of the utmost importance for providing ongoing and aiming at the identification of parameters suitable to long-lasting data quality (e.g. for potential reuse). measure severity of interventions. Scientists and admin will occasionally have to change or update the data. A clear protocol or workflow for Conclusion this is important. Although data collection is a demanding process, Data format and harmonisation the final result is worth the effort. With the present use case of a working repository, we have shown a Tabular data are stored in the long format because this way for a consortium of researchers to share data by is the favoured structure for data analysis. For users, using a Small Data approach. We provide a clear struc- data entry is explained in detail in FORm, as well as in ture of communication and data format while offering an extensive tutorial within the repository. Further, a general advice and examples for future work groups. harmonised data dictionary is used to standardise, for This we deem as a first step towards planning larger example, variable names. databases so that, one day, animal sciences will be on 38 Laboratory Animals 54(1) its way to Big Data in terms of shared experimental References data. There is still a long way to go until recent meth- 1. Keubler LM, Hoppe N, Potschka H, et al. Where are ods of deep learning and so on can be used on com- we heading? Challenges in evidence-based severity assess- bined harmonised data (e.g. in severity estimation). ment. Lab Anim 2020; 54: 50–62. However, there is still much to be learned from behav- 2. Ha¨ger C, Keubler LM, Talbot SR, et al. Running in the ioural, physiological and biochemical data and even wheel: defining individual severity levels in mice. PLoS more so when methods of ML and mathematical mod- Biol 2018; 16: e2006159. elling are applied. All these new methods require very 3. Deutsche Forschungsgemeinschaft. Sicherung guter large amounts of data to enable generalisation. Our wissenschaftlicher Praxis [Safeguarding Good Scientific Practice]. Denkschrift Memorandum, 1998, URL: approach of concentrating on a Small Data approach https://www.dfg.de/download/pdf/dfg_im_profil/reden_ is a first tap into this topic while offering great usability stellungnahmen/download/empfehlung_wiss_praxis_ for all contributing scientists. 1310.pdf. 4. Treloar A, Groenewegen D and Harboe-Ree C. The data Acknowledgements curation continuum: managing data objects in institu- We thank all FOR 2591 scientists and contributors who tional repositories. D-Lib Mag 2007; 13. DOI: 10.1045/ shared their data and made this use case possible. september2007-treloar. 5. Talbot SR, Biernot S, Bleich A, et al. Defining body Declaration of Conflicting Interests weight reduction as a humane endpoint: a critical apprai- sal. Lab Anim 2020; 54: 99–110. The author(s) declared no potential conflicts of interest with 6. Ericson E. Learn WordPress, https://learn.wordpress. respect to the research, authorship and/or publication of this com/ (accessed 4 February 2019). article. 7. Campbell HA, Micheli-Campbell MA and Udyawer V. Early career researchers embrace data sharing. Trends Funding Ecol Evol 2019; 34: 95–98. 8. Russell WMS and Burch RL. The principles of humane The author(s) disclosed receipt of the following financial experimental technique. London: Methuen, 1959. support for the research, authorship, and/or publication of 9. labfolder GmbH. labfolder, https://www.labfolder.com/ this article: This work was funded by the German Research pricing/academia/ (2019, accessed 17 July 2019). Council, Germany (Deutsche Forschungsgemeinschaft) FOR 10. Yamoah GG, Cao L, Wu CW, et al. Data curation for 2591 (FKZ Talbot: Bl953/9-1, Bruch: To542/6-1, Kiessling: preclinical and clinical multimodal imaging studies. Mol Ki1072/20-1, Tolba: To542/5-1 and To542/6-1, Bleich: Bl953/ Imaging Biol. Epub ahead of print 13 March 2019. DOI: 9-1 and Bl953/10-1). 10.1007/s11307-019-01339-0. 11. Wulff A, Haarbrandt B and Marschollek M. Clinical ORCID iDs knowledge governance framework for nationwide data Steven R Talbot https://orcid.org/0000-0002-9062-4065 infrastructure projects. Stud Health Technol Inform Stefan Bruch https://orcid.org/0000-0002-6381-7072 2018; 248: 196–203. Rene´H Tolba https://orcid.org/0000-0002-0383-3994 12. SNOMED International. SNOMED-CT, http://www. Andre´Bleich https://orcid.org/0000-0002-3438-0254 snomed.org/ (accessed 18 February 2019).

Re´sume´ L’e´valuation de la gravite´ dans des mode`les animaux est un processus reposant sur les donne´es. Nous pre´sentons par conse´quent un cas pratique pour l’e´laboration d’un re´fe´rentiel de collaboration inter- laboratoire avec la possibilite´ de charger du contenu spe´cifique, de faire des annonces de groupe et de mener des discussions pre´publication. Nous montrons clairement qu’il est possible d’offrir une telle structure avec un minimum d’efforts et une compre´hension de base des services web, en prenant e´galement en compte le facteur humain au niveau du recueil individuel des donne´es. Le re´fe´rentiel en ligne FOR2591 (FOR) sert de mode`le pour d’autres groupes, afin non seulement d’ame´liorer un jour le partage des donne´es entre les membres d’un consortium, mais e´galement de faciliter la transition du prive´ au domaine persistant.

Abstract Die Bewertung des Belastungsgrads bei Tiermodellen ist ein datengesteuerter Prozess. Wir stellen in diesem Zusammenhang einen Anwendungsfall fu¨r den Aufbau eines Repositorys fu¨r die Zusammenarbeit zwischen Labors vor, mit der Fa¨higkeit, spezifische Inhalte hochzuladen sowie Gruppenanku¨ndigungen und interne Talbot et al. 39

Diskussionen vor Vero¨ffentlichungen durchzufu¨hren. Wir zeigen eindeutig, dass es mo¨glich ist, eine solche Struktur mit minimalem Aufwand und Grundkenntnissen u¨ber webbasierte Dienste anzubieten, auch unter Beru¨cksichtigung des menschlichen Faktors bei der individuellen Datenerfassung. Das FOR2591 Online Repository (FOR) dient als Blaupause fu¨r andere Gruppen, so dass ku¨nftig nicht nur der Datenaustausch zwischen Arbeitsgruppenmitgliedern verbessert, sondern auch der U¨bergang von der privaten zur persisten- ten Domain erleichtert wird.

Resumen La evaluacio´n de gravedad en modelos animales es un proceso basado en datos. Por tanto, presentamos un caso de uso para crear un repositorio para una colaboracio´n entre laboratorios con el potencial de cargar contenido especı´fico, hacer anuncios en grupo y debates internos con anterioridad a la publicacio´n. Mostramos claramente que es posible ofrecer dicha estructura con un mı´nimo esfuerzo y una ba´sica comprensio´ndelos servicios basados en la web, tambie´n teniendo en cuenta el factor humano en la recopilacio´n de datos indivi- duales. El Repositorio Online FOR2591 (FOR) sirve de prototipo para otros grupos, para que un dı´anosoloel intercambio de datos entre miembros de consorcio mejore sino que tambie´n se agilice la transicio´n del dominio privado al persistente. Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 40–49 ! The Author(s) 2019 Systematic analysis of severity in a widely Article reuse guidelines: sagepub.com/journals-permissions used cognitive depression model for mice DOI: 10.1177/0023677219874831 journals.sagepub.com/home/lan

Anne S Mallien1 , Christine Ha¨ger2 , Rupert Palme3, Steven R Talbot2 , Miriam A Vogt4 , Natascha Pfeiffer1, Christiane Brandwein1, Birgitta Struve2, Dragos Inta1,5, Sabine Chourbaji4, Rainer Hellweg6, Barbara Vollmayr1, Andre Bleich2 and Peter Gass1

Abstract Animal models in psychiatric research are indispensable for insights into mechanisms of behaviour and mental disorders. Distress is an important aetiological factor in psychiatric diseases, especially depression, and is often used to mimic the human condition. Modern bioethics requires balancing scientific progress with animal welfare concerns. Therefore, scientifically based severity assessment of procedures is a prerequisite for choosing the least compromising paradigm according to the 3Rs principle. Evidence-based severity assessment in psychiatric animal models is scarce, particularly in depression research. Here, we assessed severity in a cognitive depression model by analysing indicators of stress and well-being, including physio- logical (body weight and corticosterone metabolite concentrations) and behavioural (nesting and burrowing behaviour) parameters. Additionally, a novel approach for objective individualised severity grading was employed using clustering of voluntary wheel running (VWR) behaviour. Exposure to the paradigm evoked a transient elevation of corticosterone, but neither affected body weight, nesting or burrowing behaviour. However, the performance in VWR was impaired after recurrent stress exposure, and the individual severity level increased, indicating that this method is more sensitive in detecting compromised welfare. Interestingly, the direct comparison to a somatic, chemically induced colitis model indicates less distress in the depression model. Further objective severity assessment studies are needed to classify the severity of psychiatric animal models in order to balance validity and welfare, reduce the stress load and thus promote refinement.

Keywords severity assessment, mice, cognitive depression, faecal corticosterone metabolites, nesting, burrowing, vol- untary wheel running, animal welfare

Date received: 1 March 2019; accepted: 19 August 2019 1RG Animal Models in Psychiatry, Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Germany 2Institute for Laboratory Animal Science, Hannover Medical School, Hannover, Germany 3Department of Biomedical Sciences, University of Veterinary Medicine, Vienna, Austria 4Interfaculty Biomedical Research Facility (IBF), University of Animal models in psychiatry are necessary to gain Heidelberg, Heidelberg, Germany 5 insight into the molecular and cellular mechanisms of Department of Psychiatry (UPK), University of Basel, Basel, Switzerland behaviour and psychiatric disorders. Since distress is a 6Department of Psychiatry and Psychotherapy, Charite´-University common aetiological factor in psychiatric disorders, Medicine Berlin, Campus Charite´ Mitte, Berlin, Germany many animal models, in particular for depression, are based on physical, environmental or social stress. This Corresponding author: raises the ethical dilemma of balancing scientific pro- Anne S Mallien, RG Animal Models in Psychiatry, Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, gress with animal welfare concerns. Therefore, it is Medical Faculty Mannheim, Heidelberg University, J5, 68159 important to grade and compare severity in scientific Mannheim, Germany. procedures in order to be able to choose the least Email: [email protected] Mallien et al. 41 discomforting alternative while maintaining efficacy. animals – unless specified further. Locomotion was The principle to reduce, refine and replace (the 3Rs)1 assessed using the open field test for 10 minutes in a animal experiments whenever possible has been inte- 50 cm 50 cm arena, and the pain threshold was grated into legislation in EU Directive 2010/63 and assessed according to the latency to react on a hot into good scientific practice (e.g. the ARRIVE guide- plate (see Supplemental Material). lines).2 Within the EU, the severity of every planned Mice were assigned to the following groups: (a) procedure needs to be classified in the project author- home cage controls (C), which remained in the colony isation process. However, the current classification of room throughout the testing phase; (b) handling con- severity levels into ‘mild’, ‘moderate’ and ‘severe’ is trols (H), which entered the shock boxes but never only partly based on scientific knowledge, since system- received a shock; (c) non-trained controls (N), which atic attempts to assess severity are scarce, particularly were exposed to avoidable foot shocks only during the for psychiatric animal models. shuttle box task on day 3; and (d) trained (T) mice, Therefore, we performed a multimodal severity which received unavoidable foot shocks during the assessment study on the stress-induced cognitive first two days of LH training and on day 3 in the shuttle depression model of learned helplessness (LH), a box (Supplemental Table S1). In experiment 3 (see widely used rodent model for depression with excellent below), we omitted the non-trained controls but intro- face, construct and predictive validity.3 It is based on duced a chemically induced colitis group (DSS) as som- exposure to inescapable electric foot shocks, which are atic controls.10 classified with the highest severity score according to EU directive 2010/63. The reasons for this rating, how- Cognitive depression model. The LH paradigm was ever, are unclear; comparisons to somatic models are conducted as previously described.11 Briefly, mice missing. were transferred into chambers with stainless- Hence, we analysed established parameters of steel grid floors. Trained subjects received 360 unpre- impaired well-being in mice to detect the compromised dictable and inescapable foot shocks (0.150 mA) on welfare induced by the LH procedure, including physio- two consecutive days. On day 3, trained and non- logical measures such as body weight loss and increased trained control subjects were analysed for helpless corticosterone release,4,5 and behavioural parameters behaviour in shuttle boxes. Each chamber contained a such as nesting6–8 and burrowing.6,9 Additionally, we signalling light, which announced foot shocks used the newly developed unsupervised k-means algo- (0.150 mA) for five seconds in one of the two compart- rithm-based cluster analysis of body weight and volun- ments. The escape performance was analysed during tary wheel running (VWR)10 to grade the level of 30 trials. severity of mice receiving foot shocks individually, In three independent experiments, we assessed cor- and we compared these results with a somatic animal ticosterone release and typical behavioural indicators of model of colitis. well-being (e.g. nesting and burrowing), or we used VWR to classify severity in the different cohorts (an overview of time lines and groups in the respective Material and methods experiments can be found in Supplemental Figure S1 Animals and housing and Table S2). Eight-week-old male C57BL/6 N mice (Charles River Experiment 1: faecal corticosterone Laboratories, Sulzfeld, Germany) were single housed metabolites in conventional macrolon cages (type II) with bedding, nesting material, tap water and food ad libitum (see Forty-eight mice were assigned to the respective groups Supplemental Material). The animals were pseudo- (see Supplemental Material). We sampled faeces to randomly assigned to the respective experimental determine faecal corticosterone metabolite (FCM) con- group. Locomotion and pain thresholds were used to centrations before the onset (pre), during each training stratify mice into groups and to avoid confounding session (training days 1 and 2), during the shuttle box effects. All experiments had been approved by German test (day 3) and one week after the LH (post LH). For animal welfare authorities (Regierungspra¨ sidium each day, two samples were collected, representing (a) Karlsruhe; 35-9185-81-G-199-17). acute corticosterone levels during the foot shock expos- ure or sham treatment and (b) a delayed idle period to General behavioural assessment detect persistent effects. Samples were collected in a secondary home cage (see Supplemental Material) and All experiments were conducted within the first three were processed as described.12 Briefly, an extract of hours of the dark phase – the active phase of the dried and homogenised faeces was produced with 42 Laboratory Animals 54(1)

80% methanol, and an aliquot was analysed in a well- Results established and validated 5a-pregnane-3b,11b,21-triol- 4,5 FCMs reveal a short-lasting increase due 20-one enzyme immunoassay (EIA). to foot shocks Experiment 2: typical indicators of FCM samples as a physiological marker for stress well-being varied significantly between different treatment groups (Figure 1). Acute FCM concentrations were elevated in Twenty-eight mice were assigned to the respective trained mice on all three days of LH, while the merged groups. The nest test was performed before and three handling group showed an exclusive effect on day 2 weeks after the LH procedure and the time to integrate (acute: LH day 1, H(2) ¼ 7.373, p ¼ 0.025, post novel material into the nest (TINT) 1 and 20 days after- hoc trained vs. home cage p ¼ 0.021; LH day 2, wards. Burrowing was analysed 2, 8 and 21 days after H(2) ¼ 7.373, p ¼ 0.025, post hoc trained vs. home the LH test. cage p ¼ 0.001 and merged handling vs. home cage Nest building was evaluated according to a rating p ¼ 0.046; shuttle box, H(3) ¼ 10.291, p ¼ 0.016, post scale based on cohesion and shape (see Supplemental hoc trained vs. home cage p ¼ 0.016). Overall, the Table S2). Additionally, we assessed the nest quality FCM concentration of trained mice showed a promin- daily at 10:00 am to track this parameter throughout ent change over time in the acute sample (Friedman: the experiment. In the TINT, sizzle material was intro- N ¼ 13, 2(4) ¼ 13.846, p ¼ 0.008) with a peak on LH duced in the diagonally opposing corner of the nest site, day 2 and normalisation after the procedure, while and latency to integrate was measured for a maximum FCM concentrations of home cage controls remained of 10 minutes.6 unchanged. The persistence of the stress response was We placed bottles (14 cm long 5.5 cm Ø) filled with measured in the delayed samples and became significant food pellets at the rear of the home cage one hour exclusively on LH day 2 (Figure 2; H(2) ¼ 8.482, before the dark phase and observed the amount that p ¼ 0.014, post hoc trained vs. home cage p ¼ 0.012 was burrowed out of the bottle (% of total weight) after and merged handling vs. home cage p ¼ 0.066). 6 and 24 hours. All mice were accustomed to the pro- cedure one week before the LH procedure on four con- Foot shocks do not affect typical well-being secutive days.6 parameters. We analysed nesting quality, integration of nesting material, burrowing behaviour and body Experiment 3: VWR weight before, during and after the LH procedure (Figure 3). Repeated-measures analysis of variance Forty-eight mice were assigned to the respective did not reveal significant differences between the treat- groups (see Supplemental Material). VWR was ment groups or interactions over time, though some recorded daily at the beginning of the dark phase. time effects became apparent for burrowing (time: To determine the steady-state running performance, F(3, 69) ¼ 4.523, p ¼ 0.006) and body weight (time: an adaptation phase of 16 days was chosen before F(4, 92) ¼ 125.561, p < 0.001). experimental onset (see Supplemental Figure S2). The LH procedure was performed on days 2–4. Due to malfunctions in the running wheel systems and Severity evaluation using VWR behaviour-based the consequential imprecision, we decided to exclude k-means clustering does not indicate severe conse- unreliable results. Two cages from each group were quences of foot shocks. Daily body weight assessment affected. and VWR performance were used to determine the cur- The colitis group received 1% dextran sulphate rent status of each mouse according to k-means clus- sodium (DSS; mol wt 36,000–50,000; MP Biomedicals, tering.10 Mice in the DSS group mildly but significantly Eschwege, Germany) for five consecutive days (days lost weight compared to the other groups from day 7 1–5) and remained in the colony room. onwards (Figure 4). The effect was largest on day 9 (M ¼ 6.8% (SD ¼ 6.1%), F(3, 39) ¼ 12.776, p < 0.001, Statistical analyses post hoc DSS vs. home cage p < 0.001, vs. handling p < 0.001, vs. trained p < 0.001). Overall, the body Statistical analyses were carried out using IBM SPSS weight changed during the experiment (time: F(13, Statistics for Windows v24 (IBM Corp., Armonk, NY). 468) ¼ 33.979, p < 0.001) and was influenced by DSS The experimental unit was the single animal. treatment (treatment time: F(39, 468) ¼ 5.932, Differences were considered to be significant at p < 0.001; treatment: F(3, 36) ¼ 4.930, p ¼ 0.006, post p 0.05. For more information, see Supplemental hoc home cage vs. DSS p ¼ 0.003). Foot shocks or Material. handling did not affect body weight. Mallien et al. 43

Figure 1. Concentrations of faecal corticosterone metabolites (FCMs) transiently increased after exposure to inescap- able stress (trained), but also after only repetitive handling in the acute sample. Handling and non-trained were identically handled until the shuttle box test and are therefore merged until then. Data are given as boxplot diagrams showing the median (line within the box), 25% and 75% quartiles (boxes), 10% and 90% ranges (whiskers). *p < 0.05; **p < 0.01.

Figure 2. In the delayed sample, concentrations of FCMs were transiently increased after repetitive exposure to inescapable stress, while handling and escapable shock did not cause any effects. Handling and non-trained animals were identically handled until the shuttle box test and are therefore merged until then. Data are given as boxplot diagrams showing the median (line within the box), 25% and 75% quartiles (boxes), 10% and 90% ranges (whiskers) and outliers (dots (1.5- to 3-fold interquartile range) or star (>threefold interquartile range). tp ¼ 0.06; *p < 0.05. 44 Laboratory Animals 54(1)

Figure 3. Exposure to shocks did not alter behavioural parameters or body weight. Days of shocks are indicated in red on the time line. (a) Neither the nest tests one week before stress and three weeks after nor daily nesting scores during exposure revealed differences. (b) The time to integrate material into the nest (TINT) was similar in all treatment groups, as well as (c) burrowing. (e) Body weight was not affected. In (a), graphs represent the median and 95% confidence interval (CI). In (b)–(d), graphs represent the mean standard deviation (SD).

VWR behaviour was sensitive to both the foot shock (time: F(13, 455) ¼ 21.475, p < 0.001; treatment time: exposure and DSS treatment (Figure 5). During the F(39, 455) ¼ 6.232, p < 0.001; treatment: F(3, 35) ¼ three days of the LH procedure, the performance chan- 4.310, p ¼ 0.011). In the pairwise comparison, a signifi- ged (time: F(2, 70) ¼ 8.842, p < 0.001), and a general cant difference was only detected between the home treatment effect was observed (treatment: F(3, cage and the DSS group (p ¼ 0.008). 35) ¼ 5.367, p ¼ 0.004). The trained group differed sig- The severity score determined by the k-means cluster nificantly from the home cage (p ¼ 0.008) and DSS analysis revealed that the majority of subjects only dis- (p ¼ 0.014) group but not from the handling controls. played a level 0 or level 1 severity during the three days However, the DSS effects were more distinct, with a of the LH procedure (Figure 6). Only 1/12 trained mice mean reduction of 40.7% (SD ¼ 23.1%) on day 8 com- reached severity level 2 after each training session. On pared to a mean of 64.0% (SD ¼ 9.8%) following the other hand, the DSS treatment led into a shift to foot shocks on day 4. The DSS-induced reduc- level 2 severity for up to 8/10 mice (day 8). Most mice tion was significant throughout the procedure recovered by day 12. Mallien et al. 45

Figure 4. Body weight development (% change from baseline) during and after the respective treatments. Data are given as the mean SD.

Figure 5. Voluntary running development (% change from baseline) during and after the respective treatments. Data are given as the mean SD. 46 Laboratory Animals 54(1)

Figure 6. Voluntary wheel running plotted against body weight in k-means cluster analysis with cluster borders (dashed lines) at different dates of the experiment. Cluster borders separate regions of the graphs into severity level 0 (>87.37), level 1 (between 50.16 and 87.37) and level 2 (<50.16).

Discussion physically restrained in the chambers and there is no illumination, we consider the chamber to be a stressful The objective of this study was to perform an evidence- environment compared to a home cage setting. The based evaluation of severity in the LH cognitive depres- surroundings somewhat resemble metabolic cages. sion model. For that purpose, we used an established The 2010/63/EU directive categorises short-term battery of physiological and behavioural tests to detect (<24 hours) exposure to metabolic cages as mild in the magnitude of evoked stress and discomfort. severity. On day 2, repeated exposure to the stressful Our results demonstrate a significant increase in condition could have triggered a profound stress FCMs in the acute samples after exposure to escapable response due the previously established association. and inescapable shocks, but also a comparable eleva- However, only the recurrent exposure to inescapable tion of FCM concentrations in non-shocked handling shocks led to a significant increase in the delayed sam- controls. Apparently, it was not only the foot shocks ples. FCMs have been found to be a sensitive parameter that triggered the stress response. Other influencing fac- of adrenocortical activity,13 and the lack of LH-induced tors might be the transport to the experimental room, effects might reflect that there was no difference in the or potentially even more bearing, the introduction to magnitude of the acute stress response and that the the chambers per se. The chamber includes a metal grid treatments could be considered equal in this respect. floor, no bedding material and no possibility to build However, caution is advised, as not every type of stres- a nest, eat or drink. Although the animals are not sor may be reflected in measured glucocorticoids.14,15 Mallien et al. 47

Therefore, several physiological and behavioural par- can be associated with a stress response or even impair- ameters (i.e. nesting, burrowing or body weight) are ments of well-being.17 Single housing is often con- typically measured to evaluate if well-being is compro- sidered harmful to mice.18 Yet, we decided to house mised in stressed animals. Yet, none of the above- our mice individually, since in our set-up male mice mentioned parameters was altered by avoidable or showed less burden.11 Other factors needed to be unavoidable foot shocks, although this procedure is taken into account. To ensure that the result of the typically considered a potent stressor and expected LH was not confounded by altered pain threshold or to induce key depressive-like behavioural features. locomotor differences, we needed to perform the Only when analysing VWR performance was a respective experiments first. The brief exposure to a change in the trained group visible. Analysing this par- painful stimulus and the illumination during the dark ameter on an individual level, two individuals reached phase in the tests can induce stress. Consequently, a severity level 2 according to the k-means cluster ana- ceiling effect might mask the differences between the lysis during the LH procedure, although their body treatment groups. We tried to avoid this by temporal weight remained unaffected. Thus, the VWR-based separation of the previous test to the LH paradigm. assessment was more sensitive to compromised welfare According to EU directive 2010/63, exposure to an than typical home cage observations. inescapable electric shock is considered severe, while Experiment 3 was designed to uncover and grade the DSS-induced colitis is classified as mild to moderate impact of the LH model compared to the DSS-induced at the dose applied here, and facial-vein phlebotomy mild colitis model using the unbiased individual sever- is classified as mild. This constitutes a dilemma, since ity grading by k-means cluster analysis.10 We chose the the evidence from the cluster analysis showed bigger colitis model because this method was initially imple- fractions of DSS-treated mice with a level 2 burden mented in this model and allowed a direct comparison than those subject to LH treatment. In relation, foot- by replicating the previous study. Additionally, it offers shocked mice were less affected. An absolute assess- a thoroughly described and evidence-based severity ment is, however, more difficult. Assuming that level assessment8,10,16 and an approved severity classification 2 is equal to moderate severity, the fact that most according to the 2010/63/EU directive dependent on shocked mice remained at level 1 during the LH pro- the doses and duration of the treatment. Here, we cedure, which could be considered mild, indicates the decided to use the moderate severity because it was mismatch of the current legal regulations, which are sufficient to induce detectable impairments in the rather supported by anthropomorphic concepts than wheel running analysis and hence serves as a suitable evidence-based severity. Therefore, more comparative positive control, although the inflicted pain is of a dif- studies are needed to assess and classify the severity ferent modality in the respective models. While the pain of the LH model and other psychiatric animal models is temporarily limited in the depression model, it is objectively. chronic in colitis. This might affect the running out- come, which could be a reason for the stronger impair- Acknowledgements ment in colitis. The psychological load of depression We thank Katja Lankisch for her excellent technical support, could in general contribute less to the running perform- Lydia Bussemer for her support in sample collection and ance than pain does. Further studies on running in mice Edith Klobetz-Rassam for EIA analysis. with different emotional states would be necessary to clarify this question. Declaration of Conflicting Interests Here, we observed a larger fraction of subjects in The author(s) declared no potential conflicts of interest with severity level 2 in the colitis model compared to the respect to the research, authorship and/or publication of this LH depression model. The foot shock–induced article. effects rather resembled the milder effects of facial 10 vein phlebotomy. Hence, colitis comprises the highest Funding strain and stress, while this LH protocol and blood The author(s) disclosed receipt of the following financial sup- sampling similarly comprise low levels of distress port for the research, authorship and/or publication of this induction. article: this work was supported by grants from the Deutsche Assessing severity requires detailed consideration of Forschungsgemeinschaft (Forschergruppe 2591 ‘Severity potential stressors, which operate as confounding fac- assessment in animal based research’, project P05) to P.G., tors. Ideally, the stressed group of interest would be as well as the Ingeborg Sta¨ nder Foundation and the Research compared to a non-stressed control group. Avoiding Fund of the UPK Basel to D.I. stress in animal maintenance and handling is very challenging, especially in an experimental context. Supplemental material Circumstantial parameters (e.g. housing conditions) Supplemental material for this article is available online. 48 Laboratory Animals 54(1)

ORCID iDs 10. Ha¨ ger C, Keubler LM, Talbot SR, et al. Running in the Anne S Mallien https://orcid.org/0000-0002-8082-6798 wheel: defining individual severity levels in mice. PLoS Christine Ha¨ ger https://orcid.org/0000-0002-6971-9780 Biol 2018; 16: e2006159. Steven R Talbot https://orcid.org/0000-0002-9062-4065 11. Chourbaji S, Zacher C, Sanchis-Segura C, et al. Social Miriam A Vogt https://orcid.org/0000-0002-3400-5067 and structural housing conditions influence the develop- Andre Bleich https://orcid.org/0000-0002-3438-0254 ment of a depressive-like phenotype in the learned help- Peter Gass https://orcid.org/0000-0003-3959-6369 lessness paradigm in male mice. Behav Brain Res 2005; 164: 100–106. References 12. Mallien AS, Palme R, Richetto J, et al. Daily exposure to a touchscreen-paradigm and associated food restriction 1. Russell WMS and Burch RL. The principles of humane evokes an increase in adrenocortical and neural activity experimental technique. London: Methuen, 1959. in mice. Horm Behav 2016; 81: 97–105. 2. Kilkenny C, Browne W, Cuthill IC, et al. Animal research: 13. Palme R. Non-invasive measurement of glucocorticoids: reporting in vivo experiments: the ARRIVE guidelines. Br advances and problems. Physiol Behav 2019; 199: J Pharmacol 2010; 160: 1577–1579. 229–243. 3. Vollmayr B and Gass P. Learned helplessness: unique fea- 14. Mormede P, Andanson S, Auperin B, et al. Exploration tures and translational value of a cognitive depression of the hypothalamic–pituitary–adrenal function as a tool model. Cell Tissue Res 2013; 354: 171–178. to evaluate animal welfare. Physiol Behav 2007; 92: 4. Touma C, Palme R and Sachser N. Analyzing cortico- 317–339. sterone metabolites in fecal samples of mice: a noninvasive 15. Koolhaas JM, Meerlo P, De Boer SF, et al. The temporal technique to monitor stress hormones. Horm Behav 2004; dynamics of the stress response. Neurosci Biobehav Rev 45: 10–22. 1997; 21: 775–782. 5. Touma C, Sachser N, Mostl E, et al. Effects of sex and 16. Biernot S. Beurteilung von Belastungsschweregraden mit- time of day on metabolism and excretion of corticosterone tels Phanotypisierung und klinischer U¨berwachung bei in urine and feces of mice. Gen Comp Endocrinol 2003; 130: ¨ 267–278. Versuchstieren. Tiera¨ rztliche Hochschule, https://nbn- 6. Jirkof P. Burrowing and nest building behavior as indica- resolving.org/urn:nbn:de:gbv:95-110357 (2017, accessed tors of well-being in mice. J Neurosci Methods 2014; 234: 10 September 2019). 139–146. 17. Olsson IA and Dahlborn K. Improving housing condi- 7. Deacon RM. Assessing nest building in mice. Nat Protoc tions for laboratory mice: a review of ‘environmental 2006; 1: 1117–1119. enrichment’. Lab Anim 2002; 36: 243–270. 8. Ha¨ ger C, Keubler LM, Biernot S, et al. Time to integrate 18. Kappel S, Hawkings P and Mendl MT. To group or not to nest test evaluation in a mouse DSS-colitis model. PLoS to group? Good practice for housing male laboratory One 2015; 10: e0143824. mice. Animals (Basel) 2017; 7. DOI: 10.3390/ 9. Deacon RM. Burrowing in rodents: a sensitive method for ani7120088. detecting behavioral dysfunction. Nat Protoc 2006; 1: 118–121.

Re´sume´ Les mode`les animaux sont indispensables a` la recherche psychiatrique pour comprendre les me´canismes de comportement et les troubles mentaux. La de´tresse est un facteur e´tiologique important des maladies psychiatriques, notamment la de´pression, et est souvent utilise´e pour imiter la condition humaine. Les conside´rations modernes de bioe´thique ne´cessitent un e´quilibre entre les progre`s scientifiques et le bien- eˆtre des animaux. Par conse´quent, une e´valuation de la gravite´ des proce´dures scientifiquement fonde´e constitue une condition pre´alable au choix du paradigme le moins compromettant selon le principe des 3 R. L’e´valuation de la gravite´ des troubles psychiatriques dans les mode`les animaux est rare, en particulier dans la recherche mene´e sur la de´pression. Nous avons e´value´ ici la gravite´ dans un mode`le cognitif de la de´pression par l’analyse des indicateurs de stress et de bien-eˆtre, dont les indicateurs physiologiques (poids corporel et concentrations de me´tabolites de corticoste´rone) et comportementaux (nidification et comporte- ment fouisseur). En outre, une nouvelle approche du classement objectif et individualise´ de la gravite´ e´tait utilise´e employe´e en regroupant les comportements d’activite´ dans une roue d’exercice volontaire (VWR). L’exposition au paradigme e´voquait une e´le´vation transitoire de la corticoste´rone, mais n’affectait ni le poids corporel, ni la nidification ou le comportement fouisseur. Cependant, la performance de VWR e´tait affecte´e apre`s exposition a` un stress re´current et le niveau de gravite´ individuel augmentait, ce qui indique que cette me´thode est plus sensible pour de´tecter un bien-eˆtre compromis. Fait inte´ressant, la comparaison directe avec un mode`le de colite somatique induit chimiquement indique moins de de´tresse dans le mode`le de la de´pression. D’autres e´tudes d’e´valuation objective de la gravite´ sont ne´cessaires pour classer la gravite´ des Mallien et al. 49 mode`les animaux de troubles psychiatriques, afin d’e´quilibrer la validite´ et le bien-eˆtre, de re´duire le niveau de stress et de promouvoir ainsi l’ame´lioration.

Abstract Tiermodelle in der psychiatrischen Forschung sind unverzichtbar fu¨r die Erforschung von Verhaltensmechanismen und psychischen Sto¨rungen. Belastung und Stress ist ein wichtiger a¨tiologischer Faktor bei psychiatrischen Erkrankungen, insbesondere bei Depressionen, und wird oft zur Nachahmung des menschlichen Zustandes eingesetzt. Die moderne Bioethik fordert ein ausgewogenes Verha¨ltnis zwischen wissenschaftlichem Fortschritt und Tierschutzbelangen. Daher ist eine wissenschaftlich fundierte Schweregradbeurteilung der Verfahren Voraussetzung fu¨r die Wahl des am wenigsten beeintra¨chtigenden Paradigmas nach dem 3 R-Prinzip. Evidenzbasierte Schweregradbeurteilung in psychiatrischen Tiermodellen gibt es selten, insbesondere in der Depressionsforschung. In der vorliegenden Studie bewerteten wir Schweregrade in einem kognitiven Depressionsmodell, indem wir Indikatoren fu¨r Stress und Wohlbefinden analysierten, darunter physiologische (Ko¨rpergewicht und Kortikosteron-Metaboliten-Konzentrationen) und verhaltensbedingte (Nestbau- und Wu¨hlverhalten). Daru¨ber hinaus wurde ein neuartiger Ansatz fu¨r die objek- tive individualisierte Schweregradklassifizierung unter Verwendung von Clustering von freiwilliger Laufradaktivita¨t (VWR) eingesetzt. Die Exposition gegenu¨ber dem Modell evozierte eine voru¨bergehende Kortikosteronerho¨hung, beeinflusste aber weder das Ko¨rpergewicht noch das Nestbau- oder Wu¨hlverhalten. Allerdings war die VWR-Leistung nach wiederholter Stressbelastung beeintra¨chtigt und der individuelle Belastungsgrad erho¨ht, was darauf hindeutet, dass mit dieser Methode beeintra¨chtigtes Wohlergehen besser erkannt werden kann. Interessanterweise deutet der direkte Vergleich mit einem soma- tischen, chemisch induzierten Kolitis-Modell auf eine geringere Belastung im Depressionsmodell hin. Weitere objektive Studien zur Schweregradbewertung sind erforderlich, um die Belastung psychiatrischer Tiermodelle zu klassifizieren, um wissenschaftliche Validita¨t und Tierschutz in Einklang zu bringen, die Stressbelastung zu reduzieren und so Verbesserungen zu fo¨rdern.

Resumen Los modelos animales usados en la investigacio´n psiquia´trica son indispensables para conocer la meca´nica de la conducta y los trastornos mentales. La ansiedad es un importante factor etiolo´gico en las enfermedades psiquia´tricas, especialmente en la depresio´n, y suele utilizarse para imitar la condicio´n humana. La bioe´tica moderna exige el equilibrio entre el progreso cientı´fico y las inquietudes sobre el bienestar de los animales. Es por ello que la evaluacio´n de la intensidad de los procedimientos con base cientı´fica es uno de los requisitos previos a la hora de elegir el paradigma menos peligroso segu´n el principio de las 3 R. Existen escasas evaluaciones de la intensidad con base en pruebas sobre modelos de animales psiquia´tricos, princi- palmente por lo que respecta a la investigacio´n sobre la depresio´n. En este caso, evaluamos la intensidad en un modelo de depresio´n cognitiva mediante el ana´lisis de indicadores de estre´s y bienestar, como de tipo fisiolo´gico (peso corporal y concentracio´n de metabolitos de corticosterona) y de tipo conductual (conducta en la nidificacio´n y las madrigueras). Igualmente, se adopto´ un enfoque innovador para la clasificacio´ndela intensidad individualizada objetiva usando un agrupamiento de conductas de carrera voluntaria en rueda (en ingle´s, VWR). La exposicio´n al paradigma apunto´ a una subida transitoria de la corticosterona, pero no afecto´ el peso corporal ni la conducta en la nidificacio´n o las madrigueras. No obstante, el rendimiento en la VWR se vio alterado tras una exposicio´n recurrente al estre´s, y el nivel individual de intensidad aumento´, lo que indica que este me´todo es ma´s sensible a la deteccio´n del bienestar en peligro. Un dato interesante es que la comparacio´n directa con el modelo de colitis inducido quı´micamente y soma´tico demuestra menos ansiedad en el modelo de depresio´n. Son necesarios nuevos estudios sobre la evaluacio´n de la intensidad objetiva con los que se clasifique la intensidad de modelos animales psiquia´tricos con el fin de equilibrar la validez y el bienestar, y reducir la carga de estre´s, de forma que se promuevan unos me´todos ma´s sofisticados. Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 50–62 ! The Author(s) 2019 Where are we heading? Challenges in Article reuse guidelines: sagepub.com/journals-permissions evidence-based severity assessment DOI: 10.1177/0023677219877216 journals.sagepub.com/home/lan

Lydia M Keubler1 , Nils Hoppe2, Heidrun Potschka3 , Steven R Talbot1 , Brigitte Vollmar4, Dietmar Zechner4 , Christine Ha¨ger1,* and Andre´ Bleich1,*

Abstract Evidence-based severity assessment in laboratory animals is, apart from the ethical responsibility, imperative to generate reproducible, standardized and valid data. However, the path towards a valid study design deter- mining the degree of pain, distress and suffering experienced by the animal is lined with pitfalls and obstacles as we will elucidate in this review. Furthermore, we will ponder on the genesis of a holistic concept relying on multifactorial composite scales. These have to combine robust and reliable parameters to measure the multidimensional aspects that define the severity of animal experiments, generating a basis for the substan- tiation of the refinement principle.

Keywords ethics, welfare, refinement, severity assessment

Date received: 21 March 2019; accepted: 19 August 2019

In an effort to express and confer meaning to the ethical Considering the multitude of facets and dimensions concerns associated with the potential pain, suffering that techniques utilized to determine pain and distress and distress animals might experience during scientific have to cover, questions arise and pitfalls emerge. An procedures, the replacement, reduction, refinement issue of concern is the implementation of severity (3R) concept was established in 1959.1 This concept is assessment procedures in the experimental study firmly anchored ethically and legally in animal-based design without risking the validity of the study. There research with numerous valuable refinement strategies are sufficient examples of assessment parameters influ- ranging from, for example, the appropriate administra- encing the outcome of the respective experiment, tion of analgesia to the pre-definition and application thereby potentially introducing bias and eventually of humane endpoints.2–4 The key element of refinement contributing to the reproducibility crisis, which is in is, however, the prompt recognition and exact assess- 5 ment of the overall status of the animal. In continu- 1Institute for Laboratory Animal Science, Hannover Medical ation, a prospective and, to some extent, retrospective School, Germany severity assessment of all experimental procedures 2Centre for Ethics and Law in the Life Sciences, University of undertaken on laboratory animals has been mandatory Hannover, Germany 3 in the European Union since 2010.6 According to Institute of Pharmacology, Toxicology and Pharmacy, Ludwig- Maximillians-University, Germany Articles 15, 38, 39 and 54, experimental procedures 4Rudolf-Zenker-Institute of Experimental Surgery, University have to be classified and allocated to severity categories Medical Center, Rostock, Germany as further specified in Annex VIII. However, due to the multitude of animal and disease models and the low *Christine Ha¨ger and Andre´ Bleich shared the last authorship availability of evidence-based, validated methods, this Corresponding author: classification is not a trivial task and compounds the Christine Ha¨ger, Hannover Medical School, Carl-Neuberg-Str. 1, problems created by an obligatory ethical and legal 30625 Hannover, Germany. framework in the context of great scientific uncertainty. Email: [email protected] Keubler et al. 51 and of itself ethically problematic.7,8 Furthermore, sci- entists struggle with the design and structure of severity assessment strategies. Is there an ideal study design to determine the degree of pain and distress? What aspects of the animal’s burden have to be recognized? Which techniques are appropriate in which model and what techniques will influence each other, risking the gener- ation of additive effects? Unfortunately, few non- invasive methods that will not influence the outcome of study parameters are available. Of course, minimal interference is imperative in routine severity assess- ment; however, it is less necessary when establishing new techniques and models or in the context of the validation of a putative refinement measure. Furthermore, it is questionable whether the utilization of single parameters is sufficient in this context or whether a multifactorial approach alone will ade- quately reflect the burden of the animal. Here, however, Figure 1. Multidimensional severity assessment. bias may even be a greater risk factor, for example due Connecting and coalescing terms determine the burden to the effect of chronological order or the reciprocal of the animal and have to be gauged relying on multiple influences of assessment procedures. parameters. In this review we will address these questions by briefly outlining the terms that determine the severity of animal experiments, while focusing on laboratory mice. We will exemplarily extract benefits and draw- Association for the Study of Pain with a corresponding backs of severity assessment techniques and highlight definition for animals.12,13 Recently, this definition has how techniques might interfere with the purpose of an been extended to include sensory, emotional, cognitive animal experiment. Furthermore, we will ponder on the and social components.14 Because pain cannot be structure of a valid study design to determine pain and assessed directly in animals, but is rather deduced distress in animals. Finally, by contrasting a single par- from pain-associated behaviours, indirect methods ameter versus a multifactorial approach, we will com- that attempt to quantify pain-associated behaviour or ment on the necessity of a holistic concept to measure nociception have been generated.5,9,15,16 It is, however, all dimensions of the animal’s burden. generally accepted that species-specific multifactorial composite pain scales are imperative rather than relying 17 A beast of burden: Multidimensional on single behavioural tests. Regarding the concepts of severity assessment ‘suffering’, ‘distress’ and ‘lasting harm’, precise defin- itions as well as objective measures are somewhat lack- According to Annex VIII of the European Union dir- ing. Moberg and colleagues have, however, provided ective 2010/63, the ‘severity of a procedure shall be working definitions and a conceptual framework on determined by the degree of pain, suffering, distress the relationship of stress and distress in animals.9,18 or lasting harm expected to be experienced by an indi- Here, consensus has been reached that distress arises vidual animal during the course of the procedure’.6 In when the biological costs of a preceding stress, defined addition, cumulative suffering and prevention from as the adaptive response to a threat to the animal’s expressing the natural behaviour shall be taken into homeostasis, negatively affect biological functions account. Therefore, the statutory provisions demanding essential to the well-being of the animal.9,18,19 In line a graded severity assessment encompass the terms with this interpretation, alterations in biologic function ‘pain’, ‘suffering’, ‘distress’ and ‘lasting harm’. Here, may serve as indicators of distress. Pursuing this definitions have evolved cumulating in a network of approach, suffering may arise when the biological connecting and coalescing aspects that determine the adaptation processes fail to return the organism to burden of the animal (as outlined in Figure 1). We homeostasis and the negative state affecting biological will, however, elucidate these definitions just briefly functions persists, leaving the animal in a state where it here, as extensive reviews are available on this cannot cope with or tolerate the inflicted level of dis- topic.9–11 Pain has been defined as ‘an unpleasant sen- tress or pain, respectively, any longer.11 In a broader sory and emotional experience associated with potential context, the term ‘suffering’ is often associated with or actual tissue damage’ by the International other negative or adverse states such as pain, anxiety 52 Laboratory Animals 54(1) or fear linking these concepts together without suffi- score sheets while also focusing on appropriate record- ciently demarcating them. This may also apply for the ing and reporting systems.4,10,11,33,34 It has, however, term ‘lasting harm’ as this may refer to and encompass been critically noted that the putative objectivity may continuing physical damage but may also evolve from merely be perfunctory as most parameters have not persistent adverse states including fear, depression or been validated per se, but rather assumed to be related anxiety. Contemplating the concept of anxiety inde- to pain and distress without the necessary statistical pendently, the measurement of this complex emotion weighting or correlation, therefore rendering these par- becomes challenging due to the variety of anxiety ameters to be subjectively chosen and highly observer facets, the diversity in underlying hypotheses, and the dependent.11,32,35 It is therefore imperative that severity diverse behavioural tests that are directed at the assessment parameters are evaluated and validated with involved psychophysiological processes but have also regard as to whether they can serve as objective tools to been shown to lack validity.20–24 In addition, the con- detect pain and distress. In this context, we aim at pro- cept that the suffering inflicted on the animal by scien- viding examples of currently utilized parameters for tific procedures can solely be assessed in its entirety severity assessment describing burden and benefits when aspects of the affective internal/emotional state while focusing on possible interferences with the experi- are incorporated into assessment has evolved recently. mental study design. However, as new approaches Cognitive bias or judgement bias tests may reflect the develop constantly, this list is not aimed at being internal affective or emotional state of animals, an exhaustive, but supposed to draw attention to obstacles attempt at including the animal’s perspective into sever- and pitfalls that will arise during severity assessment, ity assessment strategies.25–29 Affective experiences thereby raising the awareness for the introduction of including emotions are, however, subjective states as bias in experimental studies. In alignment to the com- well and cannot be measured directly, but have to be ponents compiled by Hawkins and colleagues, severity interpreted from physiological and behavioural indices, assessment parameters were categorized into physio- rendering the measurement of the affective state a logical, biochemical and behavioural techniques (see major and complex challenge.30 Table 1).34 As we have seen, providing definitions for the pro- Of course, major tools of severity assessment in cess of severity assessment is not straightforward as an laboratory animals are clinical score sheets, which accurate distinction of terms is challenging and terms have been developed and refined for multiple animal are often used as equivalents for each other as well as in and disease models.2,36,37 Benefits include the successful fluent conjunction. In addition, recognizing pain and and adequate application in experimental set ups, easy distress in non-communicating beings relies heavily on implementation and, depending on the score sheet the abilities and experience of the responsible observer design, only a limited necessity to handle the as demonstrated recently.31 Furthermore, it is a balan- animal.38 There are, however, increasing concerns cing act to deduce the necessary information from indi- regarding sensitivity, validation, standardization and ces inferred by observation without relying on observer dependency.34,39–41 A major challenge in this assumptions regarding the animal’s feelings.32 In sum- context remains ascertaining whether clinical scoring mary, it has to be considered that all aspects determin- sufficiently reflects mild to moderate states of pain ing the burden of the animal have to be integrated into and distress. If clinical scoring is applied, emphasis a multidimensional severity assessment relying on should be put on explicitly trained personnel and the physiological, behavioural and biochemical techniques utilization of validated, standardized and specifically (see Figure 1). tailored score sheets as has been extensively discussed and reviewed.31,33,41,42 Particular emphasis is often Techniques to assess severity: Balancing placed on the monitoring of the body weight of animals burden, bias and benefits during experimental procedures, on the assumption that a negative energy balance will reflect the severity Guidelines on the recognition of pain, distress and dis- the animal experiences.2,43 Indeed, shifts in energy util- comfort in experimental animals were introduced as ization or energetic inefficiency may be the result of early as 1985 taking into account changes in appear- many stressors and although there are multiple exam- ance, body weight, behaviour and clinical signs.2 These ples on the sufficient application as a severity assess- guidelines have been extensively reviewed, introducing ment parameter in experimental studies, insufficient further assessment techniques, approaches and recom- sensitivity, validation and correlation with other par- mendations. Early on, attempts at the generation of ameters have been criticized.44–46 The often-applied indices relying on the numerical rating of severity com- strict division into predefined categories has to our ponents were made and extended to assess severity or knowledge never been validated and should therefore welfare relying on systematic protocols and elaborate be critically evaluated before being used in severity ebe tal. et Keubler

Table 1. Burden, bias, benefits and critical control points of individual severity assessment techniques.

What are the critical Parameter What burden is measured? What bias may be introduced? What are the benefits? control points? References

Physiological Clinical scoring - Aimed at recognition of pain, - Observer dependency - Easy to perform - Education and training 2,31,33, distress and disease - Inter-rater variability - Not time consuming programmes (to ensure 34,36,37, progression - Dependency on subjective - Non-invasive (depending on observer experience, reliability 38,39–42 - Score sheets generally criteria (each criterion should score sheet design, animals and validity) encompass physiological and be validated) need not necessarily be - Score sheet design (validated, simple behavioural parameters - Dependency on robustness of handled) standardized and tailored to (mainly focused on hyper- and the respective score sheet species and model) hypoactivity) design - Standardization - Dependency on sensitivity (score sheets require validation that mild to moderate states of pain, distress and disease will be recognized as well) Body weight - Loss or gain generally - Influenced by many variables, - Observer independent - Validation (does change of body 2,43–47 monitored on the assumption which need not necessarily be - Easy to perform weight reflect severity in the that the course of body weight model or severity related - Not time consuming specific model?) can directly correlate with - Direct correlation with severity - Non-invasive (although animals - Correlation with other disease progression or degree is controversially discussed, have to be handled) parameters? of pain, distress or suffering rather model dependent - Standardization (may reduce - Sensitivity bias, e.g. handling, transport) Telemetry - Aimed at recognition of pain or - Influenced by many variables, - Observer independent - Validation and correlation with 48–49 distress by monitoring non- which need not necessarily be - Automatized assessment other parameters specific physiologic parameters model or severity related possible - Standardization such as heart rate or core body - Due to high degree of invasive- - Signs of pain and distress may - Automatization temperature that have to be put ness potential confounding be detected that would not be into context effects on study outcome monitored by direct observation - Reduction of inter-animal variability Behavioral Mouse grimace - Pain (spontaneously emitted - Low-grade observer-depen- - High inter-rater reliability - Standardization 50–56 scale pain behaviour measured dency (also dependent on level - Easy to perform, but requires - Automatization according to facial expressions) of automatization) necessary equipment - Evidence on the reflection of - May require handling, - Expenditure of time dependent other emotional states short-term separation or single on stage of automatization housing - Non-invasive (although animals - Dependent on reliable protocol may have to be handled) and experimental set up

(continued) 53 54

Table 1. Continued.

What are the critical Parameter What burden is measured? What bias may be introduced? What are the benefits? control points? References

Nest building - Pain, distress, disease - Observer dependency - Sufficient inter-rater reliability - Education and training 58–60 progression - Potential inter-rater variability (dependent on optimization of programmes (to ensure (dependent on optimization of protocol and training) observer experience, reliability protocol and training) - Easy to perform and validity) - Dependent on reliable protocol - Not time consuming - Protocol optimization (tailored and detailed scoring system - Non-invasive to species and model) - Influenceable by environmental - Standardization variables Burrowing - Pain, distress, disease - Dependent on reliable protocol - Observer independent - Standardization 57–58,61 progression - Easy to perform - Not time consuming - Non-invasive Voluntary - Pain and other severity aspects - Potential effects on experimen- - Observer independent - Automatization 62–68 wheel (recent evidence on the reflec- tal results have been described - Easily implemented, but - Standardization running tion of the motivational, emo- - Influenceable by environmental requires necessary equipment tional and cognitive state of variables - Non-invasive animals) - May require single housing - May be applicable in group housing Biochemical Corticosterone - Stress, although assays do not - Influenceable by many - Observer independent - Establishment and validation 69–71 distinguish between distress variables, which need not - Easy to perform after estab- of protocol and assay and eustress necessarily be model or lishment and validation - Standardization severity related - Measurement of faecal - Successful application depend- corticosterone metabolites ent on assay selection, non-invasive sampling procedure and physiological as well as aoaoyAias54(1) Animals Laboratory analytical validation - Sampling procedure may impact on study outcome depending on degree of invasiveness (e.g. blood sampling) Keubler et al. 55 assessment strategies as it may not necessarily reflect automatized assessment of severity that has been pro- model-specific dynamics or the multiple physiological posed to serve as an indicator of disturbed wellbeing processes involved in the change of body weight. (Mallien et al., this issue).63 However, several potential Furthermore, these pre-set divisions vary among guide- effects on experimental set ups have been described, lines as well as available clinical score sheets assigning including increased neurogenesis, anatomical and phys- different percentages of body weight loss to a varying ical changes and the prevention of learned helplessness/ number of corresponding severity categories or score behavioural depression.64–66 Meanwhile, wheel running points.2,11,40,42 In a critical appraisal of the suitability was shown to be modulated by social interactions and of body weight monitoring for severity assessment in when wheel running was subjected to refinement changes different models, a minority of mice reached the prede- by group housing in a study of this special issue, running fined humane endpoint for body weight loss, but did behaviour of mice changed as well (Weegh et al., this not show any clinical conspicuities underlining the issue).67,68 There is also a wide panel of experimental strong necessity for validation (Talbot et al., this environment behaviours that may be used to assess anx- issue).47 Physiological parameters may also be collected iety- and depression-like behaviour as extensively by invasive approaches, for example by telemetric rec- reviewed elsewhere.22 It has, however, been criticized ording after transmitter implantation. A major benefit recently that the validity regarding, for example, the is the observer independency that is contrasted by a plus-maze or the open-field test is insufficient.20,21 high degree of invasiveness and the monitoring of With regard to biochemical parameters, measure- non-specific parameters such as heart rate, core body ment of glucocorticoids has been extensively applied temperature or locomotor activity that may be influ- for severity assessment with a focus on the analysis of enceable by many variables and have to be interpreted stress. However, it needs to be kept in mind that hor- in the overall context, underlining again a necessity for mone secretion is subject to many variables and meas- standardized conditions.48 However, telemetric record- urement will not distinguish between distress and ing may prove useful in detecting signs of pain and eustress, therefore the combination with behavioural distress that otherwise would not be noticed by direct or physiological parameters is recommended.69,70 observation as, for example, described when assessing Crucial impact has to be attributed to the sampling post-laparotomy pain in laboratory mice.49 method and procedure as sampling itself can be stress- Furthermore, a multitude of behavioural severity ful, thereby affecting study outcome.62 However, bene- assessment parameters is available. Grimace scales fits of corticosterone measurement are observer have been established in several species to assess spon- independency and, if for example faecal corticosterone taneously emitted pain behaviour and validity studies metabolites are measured, non-invasiveness.71 have been performed, for example, to ensure that hand- Finally, non-invasive imaging techniques are ling techniques do not confound the assessment.50–52 increasingly available for utilization in severity assess- Benefits, including a high accuracy and inter-rater reli- ment strategies, allowing direct visualization and moni- ability with potential for standardization and automa- toring of disease progression in real time,72 although tization minimizing observer-dependency, as well as a major drawback is the necessity for anaesthesia with limitations, have been extensively reviewed.51,53,54 a potential confounding effect on study outcome and Furthermore, other emotional states have been assessed animal welfare.73 Therefore, and due to the high equip- recently by the analysis of facial expressions.54–56 In ment requirements, imaging techniques have been used addition, other home cage-based behaviours such as mostly in combinations to set up and enhance compos- burrowing and nest building can serve as indicators ite scales for severity assessment. However, imaging of wellbeing, more specifically as parameters in biomarker candidates for behavioural impairment in models of psychiatric disorders and to assess pain, dis- rodent epilepsy models have been described recently, tress and disease progression.57,58 However, environ- expanding the field of available imaging severity assess- mental factors may have an impact as it has been ment techniques, which we will not address here shown that, for example, nest behaviour was dependent separately.74 on cage size, an easily overlooked husbandry detail.59 Therefore, again, standardization is crucial as assess- It all comes down to study design ment protocols may vary between laboratories, result- ing in the lack or changed manifestation of these As extensively discussed above, although severity behaviours (Schwabe et al., this issue; Jirkof et al., assessment techniques are being developed or re-evalu- this issue).60,61 Another motivational- or emotional- ated, interferences with study outcome are abundant. driven behaviour, namely voluntary wheel running, Furthermore, it still has to be determined whether and has been utilized to assess the severity of experimental how techniques will influence each other, whether even procedures.62 Benefits include an observer-independent, the chronological order of assessment will introduce 56 Laboratory Animals 54(1) bias in severity assessment as well as study outcome and establishment as well as the adherence to standard how these puzzle pieces can be put together to create a operation procedures (SOPs). Precisely defined process valid study design. Here, the adherence to the basic steps, ideally made digitally available, will then allow principles of good scientific practice as laid out in for standardization on every level of the study design, respective guidelines is essential. However, although it leading to a high internal validity as impacts on the is generally accepted that flaws in the experimental experimental process are minimized. It has, however, design of animal experiments will lead to bias and been critically noted that for behavioural testing ultimately to a lack of translatability, it took the emphasis should be put on environmental heterogeni- wakeup call of the reproducibility crisis to stomp out zation by relying on multi-laboratory studies as well as universal rules not only on the reporting, but also on on systematic and controlled within-laboratory hetero- the design of animal studies.7,8,75,76 Sufficient literature genization.97 Here, it may, however, be challenging to is therefore available on this topic with emphasis on the improve the precision on estimates, especially with sources and control of variability, on practical aspects regard to inter-laboratory collaborations. To achieve confronting scientists, on the statistical fundamentals reproducibility, monitoring of 95% confidence intervals and on the recently published PREPARE guidelines between laboratory outcomes may be helpful in moni- providing recommendations on the preparation of toring differences in point estimates and distribution animal studies in form of a checklist.77–81 skewness (e.g. by the application of bootstrapping Furthermore, the utilization of a web-based experimen- methods). The necessity of implementing and harmo- tal design assistant may prove helpful to detect con- nizing SOPs in study designs for severity assessment is, founding variables in laboratory animal studies.82 however, emphasized by findings described in this spe- However, although the respective knowledge is freely cial issue. Here, a non-defined detail of the SOP for the available, it has only recently been demonstrated that assessment of burrowing behaviour in mice led to dis- there has been little improvement regarding reporting crepancies in burrowing performance (Jirkof et al., this standards and that, for example, the anaesthetic and issue).61 In another study, scoring of nesting behaviour regimens used in animal research proposals as predefined in the respective SOP had to be optimized needed optimization, implying there is still room for to enhance inter-rater reliability (Schwabe et al., this improvement in the quality of laboratory animal issue).60 Furthermore, effects of predictability and studies.83,84 adaptation to procedures as well as effects of the chron- Apart from adhering to the appropriate guidelines, ology of procedures have to be taken into account. an additional criterion, namely minimal interference Repeated predictable stress will cause resilience against with the experimental study, is essential. There are colitis-induced behavioural changes in mice and behav- numerous examples that the welfare status of the ioural changes associated with learned helplessness do animal is a major contributor in gaining valid experi- not occur if the stressor is controllable.98,99 It may also mental data, but even differential housing conditions or be assumed that the chronology of assessment tech- handling techniques were found to influence study out- niques may result in the generation of additive or recip- comes and impact on the comprehension of biological rocal effects. processes.61,85–88 Interestingly, however, in the first sys- In summary, the implementation of severity assess- tematic approach at determining the influence of envir- ment into an experimental study design seems conceiv- onmental enrichment the mean values of several able only in close connection with the establishment of physiological parameters were affected, but no consist- strictly standardized strategies and superordinate sys- ent effect on variability was detected, demonstrating tems relying on objective and constantly applicable that housing conditions can be improved without com- measures that leave enough room to be tailored to the promising data.89 Besides, scientists need to consider multitude of animal models and procedures.33 that the effects of pain, stress or distress may comprom- Furthermore, these comprehensive systems will rely ise experimental results and data validity.90–92 Here, on the standardized recording and analysis of physio- severity assessment itself may be confounded, for exam- logical, behavioural and biochemical as well as imaging ple as stress due to early separation from the dam has techniques.33,34 been shown to influence nociception in rats and hous- ing conditions influenced anxiety-related behaviour in Do we have to have them all? mice.93,94 This holds true even more when considering the recent discoveries regarding empathy for pain or Early on, it was questioned why there is no simple way distress in rodents and its potential impact on severity to measure animal welfare; single parameters may not assessment.95,96 accurately depict the burden of the animal and the Therefore, to reduce interferences to a minimum, application of multiple parameters might yield conflict- there is a strong need for concise strategies and the ing results.100 However, the lessons learned from efforts Keubler et al. 57 to optimize pain assessment in animals indicate the use- utilized in a recent study relying on the automated fulness of species-specific multidimensional composite assessment of voluntary wheel running, an emotional- pain scales, which became a standard procedure for or motivational-driven behaviour. Here, unsupervised the assessment of the sensory-discriminatory and the k-means algorithm-based cluster analysis of voluntary affective-emotional dimensions of pain in veterinary wheel running and body weight data enabled the dis- clinics.17 Along this line, it was recently suggested crimination of distinct levels of severity, allowing an that a combination of approaches analysing behaviour unbiased individual severity grading in laboratory and appearance of animals should be used for labora- mice.62 Finally, the comprehensive analysis of multiple tory rodents, although a reference assessment scheme is behavioural, biochemical and physiological parameters not yet available.101 In addition, the necessity to select can result in a gain in knowledge about the informative the appropriate indicators as well as the appropriate value of a selected parameter and of its relative alter- number of indicators has been postulated (for a ation in an experimental paradigm. Here, correlation detailed guide, see the report of the BVAAWF/ patterns may be detected between simple, easy-to- FRAME/RSPCA/UFAW Joint Working Group on apply behavioural assays or biochemical parameters Refinement).34,102 In this context it may also be con- and more complex behavioural paradigms. The appli- ceivable to set up combinations of indices in accordance cation of comprehensive statistical and bioinformatic with the Five Domains Model, resulting in a conceptual procedures can therefore provide a basis for selecting framework by evaluating nutrition, environment, appropriate or candidate parameters with high signifi- health, behaviour and mental state of an animal.103 cance and validity for evidence-based severity Recently, a panel of parameters was successfully uti- assessment. lized to determine the severity of single and repeated However, to put these notions into practice, a step- isoflurane anaesthesia procedures.104 In another study a by-step approach may be necessary that will first have combination of endocrinological, physical and behav- to define which parameters are sensitive enough to ioural parameters was used yielding no signs of measure distress with low inter-rater variability. Then, compromised welfare after either single or repeated cut-offs indicating a specific distress level will have to be open-field testing.105 However, choosing from a list of defined relying on statistical and bioinformatic proced- available parameters leaves room for subjectivity. ures. These will have to be validated in independent In this context, validity studies may prove beneficial datasets as well as in different laboratories. The general as demonstrated recently in a study concerned with applicability will then have to be assessed by applying grimace scales utilizing a statistical approach to identify these validated parameters and cut-offs in other animal a classifier to estimate the pain status in animals.49 models. In a next step, suitable parameters will have to Furthermore, when analysing multiple parameters, the be combined into a composite scale. Finally, validation application of comprehensive statistical and bioinfor- and broad applicability of this scale will have to be matic procedures can provide valuable additional infor- validated in independent datasets and distinct animal mation. For instance, principal component analysis models as well. (PCA) of complex datasets can visualize distance and relatedness between animal groups taking multiple vari- An ethical and legal challenge ables into account.106 Thus, regarding evidence-based severity assessment, PCA may aid in finding the most In essence, this approach would go a long way towards appropriate parameters as demonstrated recently in a ameliorating current methods for satisfying ethical and study assessing severity in a rat model of repeated seiz- legal requirements in severity assessment. It has become ures.106 For this, however, certain criteria such as linear clear that, despite its long history in animal welfare correlation of data have to be met. In addition, the regulation, the area of severity assessment in laboratory implicit assumption of distribution makes it hard to animals is running the risk of succumbing to what can find independent features in non-Gaussian data, and only be described as an ‘ought-implies-can’ crisis: the in animal sciences these appear rather abundantly, lack of a common language and shortage of agreed namely in score values. Therefore, PCA cannot be definitions create a baseline of ambiguity about what regarded as a generalized model for finding optimal severity means and how it may be measured. At the parameters. For this, heuristic methods based on same time, the ethical and legal framework requires mutual information and entropy as well as manifold that certainty be established before the justifiability of techniques for non-linear cases might be better suited. an experiment is proven. This creates the very unusual Another approach may focus on the identification of constellation that a socially desirable activity that by clusters. The distinction of clusters in underlying large necessity takes place in a space of epistemic uncertainty datasets as obtained by measuring multiple severity (research) must only be undertaken when ethically jus- assessment parameters is achievable by cluster analyses, tifiable, which ought to be proven to a standard that the 58 Laboratory Animals 54(1) epistemic uncertainty itself prevents: severity ought to Funding be proven with certainty, but cannot. The scientific The author(s) disclosed receipt of the following financial sup- community’s natural response to this challenge is to port for the research, authorship, and/or publication of this identify quantitative, and reproducible, approaches to article: This review was supported by the Deutsche what regulators will mostly have seen as an inherently Forschungsgemeinschaft (DFG research group FOR 2591 qualitative task: pain, suffering and distress are emo- ‘Severity assessment in animal-based research’, grant tional responses that we wish to avoid in a fellow crea- number: BL 953/10-1; PO 681/9-1; ZE 712/1-1; VO 450/15-1). ture first, and only by subsequent deduction are spikes in an organism’s hormone levels, or morbidly acceler- ORCID iDs ated breathing emblematic of these feelings. The juxta- Lydia M Keubler https://orcid.org/0000-0002-8738-9877 positional challenge of fulfilling the compelling ethical Heidrun Potschka https://orcid.org/0000-0003-1506-0252 (and legal) expectations in a fashion that is compatible Steven R Talbot https://orcid.org/0000-0002-9062-4065 with good scientific practice (which is itself a para- Dietmar Zechner https://orcid.org/0000-0002-2075-7540 mount ethical concern) can only convincingly be over- Christine Ha¨ger https://orcid.org/0000-0002-6971-9780 come by establishing strong common standards, Andre´Bleich https://orcid.org/0000-0002-3438-0254 sharing data as broadly as possible, improving the training of personnel and continuous development of References better techniques. 1. Russell WMS and Burch RL. The principles of humane experimental technique. Wheathampstead (UK): Conclusion Universities Federation for Animal Welfare, 1959. 2. Morton DB and Griffiths PH. Guidelines on the recog- Unravelling the actual severity of experimental proced- nition of pain, distress and discomfort in experimental ures has become mandatory for ethical reasons as well animals and an hypothesis for assessment. Vet Rec as to ensure the generation of reproducible, standar- 1985; 116: 431–436. dized and valid data. Considering the variety of influen- 3. Tannenbaum J and Bennett BT. Russell and Burch’s 3Rs then and now: The need for clarity in definition and pur- cing factors that in addition to pain can contribute to pose. J Am Assoc Lab Anim Sci 2015; 54: 120–132. an experimental animal’s burden, it is highly probable 4. Morton DB. A systematic approach for establishing that the analysis of the overall or cumulative severity humane endpoints. ILAR J 2000; 41: 80–86. requires multidimensional composite scales with a com- 5. Flecknell PA. Refinement of animal use: Assessment and bination of robust parameters. alleviation of pain and distress. Lab Anim 1994; 28: However, the path to establish such an ideal study 222–231. design is strewn with obstacles due to the multitude of 6. Directive 2010/63/EU of the European Parliament and of severity aspects, the potential of interference with the the Council of 22 September 2010 on the protection of experimental study, the lack of standardization and the animals used for scientific purposes. O J Eur Union 2010; need for a statistical-based multifactorial and multi- L276: 233279. centred parameter approach. Therefore, an evidence- 7. Begley CG and Ioannidis JP. Reproducibility in science: based severity assessment needs to pay credit to the Improving the standard for basic and preclinical research. Circ Res 2015; 116: 116–126. potential introduction of bias by implementing highly 8. Kilkenny C, Parsons N, Kadyszewski E, et al. Survey of standardized SOPs, strategies and superordinate sys- the quality of experimental design, statistical analysis and tems. Furthermore, constantly applicable, robust par- reporting of research using animals. PLoS One 2009; 4: ameters should aim at the immediate identification of e7824. the actual, real-time severity experienced by the animal. 9. Carstens E and Moberg GP. Recognizing pain and dis- This requires the development of further simple and tress in laboratory animals. ILAR J 2000; 41: 62–71. non-invasive assessment techniques as well as the 10. Wallace J, Sanford J, Smith MW, et al. The assessment intra-, inter- and cross-validation of techniques. Here, and control of the severity of scientific procedures on cross-correlation analyses may guide the future selec- laboratory animals. Lab Anim 1990; 24: 97–130. tion of reliable and robust parameters to be combined 11. Pain and distress in laboratory rodents and lagomorphs. in a holistic concept that integrates all dimensions of Report of the Federation of European Laboratory Animal Science Associations (FELASA) Working the animal’s burden, ultimately contributing to the real- Group on Pain and Distress accepted by the FELASA ization of the refinement principle. Board of Management November 1992. Lab Anim 1994; 28: 97–112. Declaration of Conflicting Interests 12. Merskey H and Bogduk N (eds) Part III: Pain Terms, A The author(s) declared no potential conflicts of interest with Current List with Definitions and Notes on Usage. respect to the research, authorship, and/or publication of this Classification of , 2nd ed. Seattle: IASP article. Press, 1994, pp.209–214. Keubler et al. 59

13. Zimmerman M. Physiological mechanisms of pain and its 34. Hawkins P, Morton DB, Burman O, et al. A guide to treatment. Klinische Anaesthesiol Intensivether 1986; 32: defining and implementing protocols for the welfare 1–19. assessment of laboratory animals: Eleventh report of 14. Williams AC and Craig KD. Updating the definition of the BVAAWF/FRAME/RSPCA/UFAW Joint Working pain. Pain 2016; 157: 2420–2423. Group on Refinement. Lab Anim 2011; 45: 1–13. 15. Mogil JS. Animal models of pain: Orogress and chal- 35. Hawkins P. Recognizing and assessing pain, suffering lenges. Nat Rev Neurosci 2009; 10: 283–294. and distress in laboratory animals: A survey of current 16. Deuis JR, Dvorakova LS and Vetter I. Methods used to practice in the UK with recommendations. Lab Anim evaluate pain behaviors in rodents. Front Mol Neurosci 2002; 36: 378–395. 2017; 10: 284. 36. van Griensven M, Dahlweid FM, Giannoudis PV, et al. 17. Reid J, Nolan AM and Scott EM. Measuring pain in Dehydroepiandrosterone (DHEA) modulates the activity dogs and cats using structured behavioural observation. and the expression of lymphocyte subpopulations Vet J 2018; 236: 72–79. induced by cecal ligation and puncture. Shock 2002; 18: 18. Moberg GP. When does stress become distress? Lab Anim 445–449. 1999; 28: 22–26. 37. Jirkof P, Tourvieille A, Cinelli P, et al. for 19. Committee on Recognition and Alleviation of Distress in pain relief in mice: Repeated injections vs sustained- Laboratory Animals, National Research Council. release depot formulation. Lab Anim 2015; 49: 177–187. Recognition and Alleviation of Distress in Laboratory 38. Lloyd MH and Wolfensohn SE. Practical use of distress Animals. Washington (DC), 2008. scoring systems in the application of humane endpoints. 20. Ennaceur A. Tests of unconditioned anxiety: Pitfalls and In: Hendriksen CFMM, D. B. (ed.) International disappointments. Physiol Behav 2014; 135: 55–71. Conference on humane endpoints. Zeist: The Royal 21. Ennaceur A and Chazot PL. Preclinical animal anxiety Society of Medicine Press, 1999, pp.48–53. research: Flaws and prejudices. Pharmacol Res Perspect 39. Keubler LM, Tolba RH, Bleich A, et al. Severity assess- 2016; 4: e00223. ment in laboratory animals: A short overview on poten- 22. Harro J. Animals, anxiety, and anxiety disorders: How to tially applicable parameters. Berl Mu¨nch Tiera¨rztl measure anxiety in rodents and why. Behav Brain Res Wochenschr 2018; 131: 299–303. 2018; 352: 81–93. 40. Kanzler S, Rix A, Czigany Z, et al. Recommendation for 23. Hoffman KL. New dimensions in the use of rodent severity assessment following liver resection and liver trans- behavioral tests for novel drug discovery and develop- plantation in rats: Part I. Lab Anim 2016; 50: 459–467. ment. Expert Opin Drug Discov 2016; 11: 343–353. 41. Palle P, Ferreira FM, Methner A, et al. The more the 24. Rodgers RJ. Animal models of ‘anxiety’: Where next? merrier? Scoring, statistics and animal welfare in experi- Behav Pharmacol 1997; 8: 477–496. (discussion 497–504). mental autoimmune encephalomyelitis. Lab Anim 2016; 25. Boleij H, van’t Klooster J, Lavrijsen M, et al. A test to 50: 427–432. identify judgement bias in mice. Behav Brain Res 2012; 42. Fentener van Vlissingen JM, Borrens M, Girod A, et al. The 233: 45–54. reporting of clinical signs in laboratory animals: FELASA 26. Guldimann K, Vogeli S, Wolf M, et al. Frontal brain Working Group Report. Lab Anim 2015; 49: 267–283. deactivation during a non-verbal cognitive judgement 43. Ullman-Cullere MH and Foltz CJ. Body condition scor- bias test in sheep. Brain Cogn 2015; 93: 35–41. ing: A rapid and accurate method for assessing health 27. Habedank A, Kahnau P, Diederich K, et al. Severity status in mice. Lab Anim Sci 1999; 49: 319–323. assessment from an animal’s point of view. Berl Mu¨nch 44. Elsasser TH, Kahl S, Rumsey TS, et al. Modulation of Tiera¨rztl Wochenschr 2018; 131: 304–320. growth performance in disease: Reactive nitrogen com- 28. Kloke V, Schreiber RS, Bodden C, et al. Hope for the pounds and their impact on cell proteins. Domest Anim best or prepare for the worst? Towards a spatial cognitive Endocrinol 2000; 19: 75–84. bias test for mice. PLoS One 2014; 9: e105431. 45. Laugero KD and Moberg GP. Energetic response to 29. Harding EJ, Paul ES and Mendl M. Animal repeated restraint stress in rapidly growing mice. Am J behaviour: Cognitive bias and affective state. Nature Physiol Endocrinol Metab 2000; 279: E33–43. 2004; 427: 312. 46. Harris RB, Zhou J, Youngblood BD, et al. Effect of 30. Flecknell P, Leach M and Bateson M. Affective state and repeated stress on body weight and body composition quality of life in mice. Pain 2011; 152: 963–964. of rats fed low- and high-fat diets. Am J Physiol 1998; 31. Herrmann K and Flecknell P. Severity classification of 275: R1928–1938. surgical procedures and application of health monitoring 47. Talbot SR, Biernot S, Bleich A, et al. Defining strategies in animal research proposals: A retrospective body weight reduction as a humane endpoint: a critical review. Altern Lab Anim 2018; 46: 273–289. appraisal. Lab Anim 2020; 54: 99–110. 32. Flecknell P. Replacement, reduction and refinement. 48. Cesarovic N, Jirkof P, Rettich A, et al. Implantation of ALTEX 2002; 19: 73–78. radiotelemetry transmitters yielding data on ECG, heart 33. Smith D, Anderson D, Degryse AD, et al. Classification rate, core body temperature and activity in free-moving and reporting of severity experienced by animals used in laboratory mice. J Vis Exp 2011; 57: 3260. scientific procedures: FELASA/ECLAM/ESLAV 49. Arras M, Rettich A, Cinelli P, et al. Assessment of post- Working Group report. Lab Anim 2018; 52: 5–57. laparotomy pain in laboratory mice by telemetric 60 Laboratory Animals 54(1)

recording of heart rate and heart rate variability. BMC 68. Weegh N, Fu¨ner J, Jahnke O, et al. Wheel running Vet Res 2007; 3: 16. behaviour in group-housed female mice indicates dis- 50. Dalla Costa E, Pascuzzo R, Leach MC, et al. Can grim- turbed wellbeing due to DSS colitis. Lab Anim 2020; 54: ace scales estimate the pain status in horses and mice? 63–72. A statistical approach to identify a classifier. PLoS One 69. Mormede P, Andanson S, Auperin B, et al. Exploration of 2018; 13: e0200339. the hypothalamic-pituitary-adrenal function as a tool to 51. Langford DJ, Bailey AL, Chanda ML, et al. Coding of evaluate animal welfare. Physiol Behav 2007; 92: 317–339. facial expressions of pain in the laboratory mouse. Nat 70. Ralph CR and Tilbrook AJ. INVITED REVIEW: The Methods 2010; 7: 447–449. usefulness of measuring glucocorticoids for assessing 52. Miller AL and Leach MC. The effect of handling method animal welfare. J Anim Sci 2016; 94: 457–470. on the mouse grimace scale in two strains of laboratory 71. Palme R. Non-invasive measurement of glucocorticoids: mice. Lab Anim 2016; 50: 305–307. Advances and problems. Physiol Behav 2019; 199: 229–243. 53. Ha¨ger C, Biernot S, Buettner M, et al. The Sheep Grimace 72. Michael S, Keubler LM, Smoczek A, et al. Quantitative Scale as an indicator of post-operative distress and pain in phenotyping of inflammatory bowel disease in the IL-10- laboratory sheep. PLoS One 2017; 12: e0175839. deficient mouse by use of noninvasive magnetic reson- 54. Descovich KA, Wathan J, Leach MC, et al. Facial ance imaging. Inflamm Bowel Dis 2013; 19: 185–193. expression: An under-utilised tool for the assessment of 73. Tremoleda JL, Kerton A and Gsell W. Anaesthesia and welfare in mammals. ALTEX 2017; 34: 409–429. physiological monitoring during in vivo imaging of 55. Camerlink I, Coulange E, Farish M, et al. Facial expres- laboratory rodents: Considerations on experimental out- sion as a potential measure of both intent and emotion. comes and animal welfare. EJNMMI Res 2012; 2: 44. Sci Rep 2018; 8: 17602. 74. van Dijk RM, Di Liberto V, Brendel M, et al. Imaging 56. Finlayson K, Lampe JF, Hintze S, et al. Facial Indicators of biomarkers of behavioral impairments: A pilot micro- Positive Emotions in Rats. PLoS One 2016; 11: e0166446. positron emission tomographic study in a rat electrical 57. Deacon RM. Burrowing in rodents: a sensitive method post-status epilepticus model. Epilepsia 2018; 59: for detecting behavioral dysfunction. Nat Protoc 2006; 1: 2194–2205. 118–121. 75. Howells DW, Sena ES and Macleod MR. Bringing rigour 58. Jirkof P. Burrowing and nest building behavior as indi- to translational medicine. Nat Rev Neurol 2014; 10: 37–43. cators of well-being in mice. J Neurosci Methods 2014; 76. Smith AJ, Clutton RE, Lilley E, et al. Improving animal 234: 139–146. research: PREPARE before you ARRIVE. BMJ 2018; 59. Gaskill BN and Pritchett-Corning KR. The effect of cage 360: k760. space on behavior and reproduction in Crl:CD1(Icr) and 77. Smith AJ, Clutton RE, Lilley E, et al. PREPARE: guide- C57BL/6NCrl Laboratory Mice. PLoS One 2015; 10: lines for planning animal research and testing. Lab Anim e0127875. 2018; 52: 135–141. 60. Schwabe K, Boldt L, Bleich A, et al. Nest-building per- 78. Colman K. Impact of the genetics and source of preclin- formance in rats: impact of vendor, experience, and sex. ical safety animal models on study design, results, and Lab Anim 2020; 54: 17–25. interpretation. Toxicol Pathol 2017; 45: 94–106. 61. Jirkof P, Abdelrahman A, Bleich A, et al. A safe bet? 79. Festing MF and Altman DG. Guidelines for the design Inter-laboratory variability in behavior-based severity and statistical analysis of experiments using laboratory assessment. Lab Anim 2020; 54: 73–82. animals. ILAR J 2002; 43: 244–258. 62. Ha¨ger C, Keubler LM, Talbot SR, et al. Running in the 80. Johnson PD and Besselsen DG. Practical aspects of wheel: Defining individual severity levels in mice. PLoS experimental design in animal research. ILAR J 2002; Biol 2018; 16: e2006159. 43: 202–206. 63. Mallien AS, Ha¨ger C, Palme R, et al. Systematic analysis of 81. Howard BR. Control of variability. ILAR J 2002; 43: severity in a widely used cognitive depression model for mice. 194–201. Lab Anim 2020; 54: 40–49. 82. du Sert NP, Bamsey I, Bate ST, et al. The experimental 64. Greenwood BN, Foley TE, Day HE, et al. Freewheel design assistant. Nat Methods 2017; 14: 1024–1025. running prevents learned helplessness/behavioral depres- 83. Herrmann K and Flecknell P. Retrospective review of sion: Role of dorsal raphe serotonergic neurons. anesthetic and analgesic regimens used in animal research J Neurosci 2003; 23: 2889–2898. proposals. ALTEX 2019; 36: 65–80. 65. Richter SH, Gass P and Fuss J. Resting is rusting: A 84. Baker D, Lidster K, Sottomayor A, et al. Two years later: critical view on rodent wheel-running behavior. journals are not yet enforcing the ARRIVE guidelines on Neuroscientist 2014; 20: 313–325. reporting standards for pre-clinical animal studies. PLoS 66. Fuss J, Ben Abdallah NM, Vogt MA, et al. Voluntary Biol 2014; 12: e1001756. exercise induces anxiety-like behavior in adult C57BL/6J 85. Wurbel H. Ideal homes? Housing effects on rodent brain mice correlating with hippocampal neurogenesis. and behaviour. Trends Neurosci 2001; 24: 207–211. Hippocampus 2010; 20: 364–376. 86. Poole T. Happy animals make good science. Lab Anim 67. Dewan I, Garland T Jr., Hiramatsu L, et al. I smell a 1997; 31: 116–124. mouse: Indirect genetic effects on voluntary wheel-run- 87. Clarkson JM, Dwyer DM, Flecknell PA, et al. Handling ning distance, duration and speed. Behav Genet 2019; method alters the hedonic value of reward in laboratory 49: 49–59. mice. Sci Rep 2018; 8: 2448. Keubler et al. 61

88. Garrido P, De Blas M, Ronzoni G, et al. Differential 98. Hassan AM, Jain P, Reichmann F, et al. Repeated pre- effects of environmental enrichment and isolation hous- dictable stress causes resilience against colitis-induced ing on the hormonal and neurochemical responses to behavioral changes in mice. Front Behav Neurosci 2014; stress in the prefrontal cortex of the adult rat: 8: 386. Relationship to working and emotional memories. 99. Maier SF and Watkins LR. Stressor controllability and J Neural Transm (Vienna) 2013; 120: 829–843. learned helplessness: The roles of the dorsal raphe 89. Andre V, Gau C, Scheideler A, et al. Laboratory mouse nucleus, serotonin, and corticotropin-releasing factor. housing conditions can be improved using common Neurosci Biobehav Rev 2005; 29: 829–841. environmental enrichment without compromising data. 100. Mason G and Mendl M. Why is there no simple way of PLoS Biol 2018; 16: e2005019. measuring animal welfare? Anim Welfare 1993; 2: 90. Everds NE, Snyder PW, Bailey KL, et al. Interpreting 301–319. stress responses during routine toxicity studies: A 101. Flecknell P. Rodent analgesia: Assessment and thera- review of the biology, impact, and assessment. Toxicol peutics. Vet J 2018; 232: 70–77. Pathol 2013; 41: 560–614. 102. Broom DM. Assessing welfare and suffering. Behav 91. Garner JP. Stereotypies and other abnormal repetitive Processes 1991; 25: 117–123. behaviors: Potential impact on validity, reliability, and 103. Mellor DJ. Operational details of the five domains replicability of scientific outcomes. ILAR journal 2005; model and its key applications to the assessment and 46: 106–117. management of animal welfare. Animals (Basel) 2017; 7. 92. Jirkof P. Side effects of pain and analgesia in animal 104. Hohlbaum K, Bert B, Dietze S, et al. Severity classifica- experimentation. Lab Anim (NY) 2017; 46: 123–128. tion of repeated isoflurane anesthesia in C57BL/6JRj 93. Dickinson AL, Leach MC and Flecknell PA. Influence of PLoS One early neonatal experience on nociceptive responses and mice-Assessing the degree of distress. 2017; analgesic effects in rats. Lab Anim 2009; 43: 11–16. 12: e0179588. 94. Burman O, Buccarello L, Redaelli V, et al. The effect of 105. Bodden C, Siestrup S, Palme R, et al. Evidence-based two different individually ventilated cage systems on severity assessment: Impact of repeated versus single anxiety-related behaviour and welfare in two strains of open-field testing on welfare in C57BL/6J mice. Behav laboratory mouse. Physiol Behav 2014; 124: 92–99. Brain Res 2018; 336: 261–268. 95. Chen J. Empathy for distress in humans and rodents. 106. Moller C, Wolf F, van Dijk RM, et al. Toward evidence- Neurosci Bull 2018; 34: 216–236. based severity assessment in rat models with repeated 96. Mogil JS. Social modulation of and by pain in humans seizures: I. Electrical kindling. Epilepsia 2018; 59: and rodents. Pain 2015; 156(Suppl 1): S35–S41. 765–777. 97. Richter SH, Garner JP and Wurbel H. Environmental standardization: Cure or cause of poor reproducibility in animal experiments? Nat Methods 2009; 6: 257–261.

Re´sume´ L’e´valuation de la gravite´ fonde´e sur les preuves chez les animaux de laboratoire est, a` l’exception de la responsabilite´ e´thique, indispensable pour ge´ne´rer des donne´es valides, reproductibles et standardise´es. Cependant, la voie vers une conception d’e´tude valide de´terminant le degre´ de douleur, de de´tresse et de souffrance ve´cues par l’animal est seme´e d’embuˆches et d’obstacles que nous allons e´lucider dans cet examen. Nous re´fle´chirons en outre a` la gene`se d’un concept global s’appuyant sur les e´chelles composites multifactorielles. Ces e´chelles permettent de combiner des parame`tres robustes et fiables pour mesurer les aspects multidimensionnels qui de´finissent la gravite´ de l’expe´rimentation animale, ge´ne´rant ainsi une base pour la justification du principe de raffinement.

Abstract Eine evidenzbasierte Schweregradbeurteilung bei Versuchstieren ist, nicht nur aus der ethischen Verantwortung heraus, unerla¨sslich, um reproduzierbare, standardisierte und valide Daten zu generieren. Allerdings wird die Konzipierung eines validen Studiendesigns, das den Grad der Schmerzen, A¨ngste und Leiden des Tieres ermittelt, durch Hindernisse erschwert, wie wir in dieser U¨bersicht erla¨utern. Daru¨ber hinaus befassen wir uns mit der Entstehung eines ganzheitlichen Konzepts, das auf multifaktoriellen Komposit-Skalen basiert. Diese mu¨ssen robuste und zuverla¨ssige Parameter kombinieren, um die multi- dimensionalen Aspekte zu erfassen, die den Belastungsgrad von Tierversuchen definieren, und um dadurch eine Grundlage fu¨r die Verwirklichung des Refinement-Prinzips zu schaffen. 62 Laboratory Animals 54(1)

Resumen La evaluacio´n de la gravedad basada en la evidencia en animales de laboratorio es, aparte de la respons- abilidad e´tica, imperativa para generar datos reproducibles, estandarizados y va´lidos. Sin embargo, el camino hacia un disen˜o de estudio va´lido que determine el grado de dolor, angustia y sufrimiento experimentado por el animal esta´ lleno de trampas y obsta´culos, tal y como se explicara´ en esta revisio´n. Adema´s, reflexionar- emos sobre la ge´nesis de un concepto holı´stico basado en escalas compuestas multifactoriales. Estos deben combinar para´metros so´lidos y fiables para medir los aspectos multidimensionales que definen la severidad de los experimentos con animales, generando una base para fundamentar el principio de refinamiento. Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 63–72 ! The Author(s) 2019 Wheel running behaviour in group-housed Article reuse guidelines: sagepub.com/journals-permissions female mice indicates disturbed wellbeing DOI: 10.1177/0023677219879455 journals.sagepub.com/home/lan due to DSS colitis

Nora Weegh1, Jonas Fu¨ner2, Oliver Janke2, York Winter3, Christian Jung4, Birgitta Struve1, Laura Wassermann1, Lars Lewejohann5,6, Andre´ Bleich1 and Christine Ha¨ger1

Abstract Voluntary wheel running (VWR) behaviour is a sensitive indicator of disturbed wellbeing and used for the assessment of individual experimental severity levels in laboratory mice. However, monitoring individual VWR performance usually requires single housing, which itself might have a negative effect on wellbeing. In con- sideration of the 3Rs principle, VWR behaviour was evaluated under group-housing conditions. To test the applicability for severity assessment, this readout was evaluated in a dextran sodium sulphate (DSS) induced colitis model. For continuous monitoring, an automated system with integrated radio-frequency identification technology was used, enabling detection of individual VWR. After a 14-day adaptation period mice demon- strated a stable running performance. Analysis during DSS treatment in combination with repeated facial vein phlebotomy and faecal sampling procedure resulted in significantly reduced VWR behaviour during the course of colitis and increased VWR during disease recovery. Mice submitted to phlebotomy and faecal sampling but no DSS treatment showed less reduced VWR but a longer-lasting recovery. Application of a cluster model discriminating individual severity levels based on VWR and body weight data revealed the highest severity level in most of the DSS-treated mice on day 7, but a considerable number of control mice also showed elevated severity levels due to sampling procedures alone. In summary, VWR sensitively indicated the course of DSS colitis severity and the impact of sample collection. Therefore, monitoring of VWR is a suitable method for the detection of disturbed wellbeing due to DSS colitis and sampling procedure in group-housed female laboratory mice.

Keywords 3Rs, behaviour, ethics and welfare, housing, social behaviour, wellbeing, wheel running

Date received: 6 March 2019; accepted: 9 September 2019

1Institute for Laboratory Animal Science, Hannover Medical School, Germany Voluntary wheel running (VWR) behaviour is a fre- 2preclinics, Potsdam, Germany quently used measure of general activity, exploration 3Institute of Biology, Humboldt University, Berlin and migration and is associated with mood and 4PhenoSys, Berlin, Germany 5 reward,1,2 but also serves as an indicator of pain in German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, studies investigating inflammatory pain and nerve 3,4 Germany injury. Recently, on the basis of VWR behaviour, a 6Institute of Animal Welfare, Animal Behaviour and Laboratory cluster model has been established to define individual Animal Science, Freie Universita¨t Berlin, Berlin, Germany levels of severity during intestinal inflammation and stress in mice.5 Here, decreased VWR was observed Corresponding author: Andre´ Bleich, Institute for Laboratory Animal Science and Central during the onset and course of dextran sodium sulphate Animal Facility, Hannover Medical School, Carl-Neuberg-Str. 1, (DSS)-induced acute colitis with graded degrees of 30625 Hannover, Germany. inflammation and also in a model of restraint stress. Email: [email protected] 64 Laboratory Animals 54(1) Mice and experimental set-up Combining the parameters VWR and body weight change provided a measure to assess the severity experi- Female, 10 to 11 week-old C57BL/6J (B6) mice were enced by the animal. Analysis of these data with two obtained from the Central Animal Facility (Hannover independent mathematical algorithms revealed three Medical School, Hannover, Germany). Routine health severity levels, which might describe either no, mild, surveillance and microbiologic monitoring according to or moderate severity. the Federation of European Laboratory Animal The 3Rs principle (refine, reduce, replace) of Russell Associations recommendations did not reveal any evi- and Burch6 is the ethical and, in Europe, legal pre- dence of infection with common murine pathogens.13 requisite for animal-based research. However, limiting Mice were maintained in a room with controlled envir- severity for animals under experimentation to a min- onment (21–23 C; relative humidity 55 5%; 14/10 h imum requires the unambiguous and objective defin- light/dark cycle with lights on at 07:00 h and lights out ition of severity levels. Although the developed cluster at 21:00 h). Mice were housed in groups of seven mice model meets these requirements, mice of the above in connected macrolon cages (360 cm2; Figure 1(a)) described study were single housed, which might with softwood granulate (poplar wood, AB 368P, impact the welfare of these animals.7 Housing condi- AsBe-wood GmbH, Germany) and the bedding was tions have to be considered not only with regard to changed once a week. Nesting material was omitted wellbeing but also as a potential confounder. Mice to decrease possibly confounding factors of activity are social animals and group housing is recommended apart from social interaction and wheel running. to avoid social isolation and to maximize wellbeing.8 Pelleted diet (Altromin 1324, Lage, Germany) and This topic, however, is controversially discussed. autoclaved water were provided ad libitum. Many studies report altered physiological and behav- ioural responses as a consequence of social deprivation Transponder implantation due to single housing.9 However, there are also studies showing that stress response and results of behavioural After one week of habituation, RFID transponders tests were not different between group and single hous- were implanted under isoflurane anaesthesia on a ing and that there were no adverse effects resulting from warming pad after induction with 4 vol% isoflurane single housing after laparotomy.10,11 In preference tests, in a clear box and confirmed absence of withdrawal mice showed a high demand for social housing.12 Thus, reflex. Anaesthesia was maintained at approximately at least for female mice, social housing is generally rec- 2 vol% isoflurane. Cornea protection was provided ommended. In addition, space restrictions in most con- by use of BepanthenÕ eye ointment. ventional housing systems would allow only a limited The animals’ fur in the cranial back was clipped, the number of running wheels within one cage and individ- skin disinfected and the RFID transponder was inserted ual performance would be almost impossible to track. subcutaneously via a minimal skin opening matching the In the present study, VWR behaviour in group-housed diameter of the implant (2 mm). To prevent loss, a female mice was analysed in a system providing access loosely applied U suture was added. The average dur- to running wheels for each individual mouse. ation of anaesthesia was approximately 7 min in total. The main goal of the study was the evaluation of Animals were constantly monitored during and 1 h after VWR behaviour in socially housed mice as a parameter anaesthesia for any adverse clinical signs and daily fol- for disturbed wellbeing. Therefore, a new running lowing implantation. After one week of recovery, the wheel system for group-housed mice with integrated cages were equipped with running wheels. radio-frequency identification (RFID) technology and automated monitoring of running performance was Group running system tested. Finally, the clinical and behavioural data were entered into a mathematical cluster model to define For collection of individual VWR data, a running wheel individual severity levels in mice. system with integrated RFID technology was used (IDrevolyzer, by PhenoSys (Berlin, Germany) and Preclinics (Potsdam, Germany)) (Figure 1(a)). Six stand- Animals, materials and methods ard cages (EU Type II), each harbouring a single run- Ethics statement ning wheel, were interconnected via plastic tubes. Therefore, group-housed mice within this system could This study was conducted in accordance with German freely choose between the wheels. RFID sensors were law for animal protection and the European Directive, located outside the cages behind each running wheel, 2010/63/EU. All experiments were approved and per- exclusively detecting the transponder of the mouse cur- mitted by the Lower Saxony State Office for Consumer rently active in the wheel. As readout parameter for each Protection and Food Safety (LAVES, license 16/2194). individual mouse, number of wheel rotations in total Weegh et al. 65

Figure 1. Group running system and wheel running parameters during adaptation phase: schematic illustration of the group running system and a close-up picture of cages 2 and 3 (a); preference of running wheels (RW) depicted in a heat map (b); both outer wheels are used with very high frequency by all animals. Course of voluntary wheel running (VWR (c)) and maximum velocity (Vmax (d)) during adaptation phase, showing a slight increase over time and stable baseline values (n ¼ 21). d: day.

between 13:00 h and 08:00 h were recorded (VWR(19)), The DSS-treated mice were also submitted to sample therefore omitting 5 h of the light phase, during which collection procedure. All mice were euthanized on day procedures took place. Further parameters of running 14 of the experiment by CO2 inhalation and subsequent activity were the highest number of rotations within blood collection from the heart followed by dissection 1 min (maximum velocity (Vmax)) and duration of for organ harvesting. wheel occupation for each wheel. Clinical scoring Experimental design, sample collection and DSS colitis Starting three days before transponder implantation, the animals’ condition was assessed daily around Animals of the control group were continuously 08:30 h by means of a previously published model- handled and weighed whereas animals of the 0% DSS specific clinical score which utilizes parameters such as group additionally underwent sample collections on stool consistency, condition of the fur and behaviour.14 days 0, 5 and 14. Sample collection procedure consisted Changes in body weight were evaluated separately. of facial vein phlebotomy (20 G cannula for skin punc- All handling and scoring procedures were confined to ture, alternating sides) to collect 15 ml of blood for prep- two designated experienced staff members not blinded aration of dried blood spots, followed by collection of to experimental group. faecal samples, for which animals spent 2 h in separate cages lined with hydrophobic sand (Coastline Global Histology Inc., Palo Alto, USA). Acute colitis was induced in the 1% DSS group by The colon was flushed immediately following removal oral administration of a 1% DSS solution via the drink- and prepared as a ‘Swiss roll’15 before the colon ing water over five consecutive days (day 0 to day 4). was fixed in a 4% buffered formalin solution. 66 Laboratory Animals 54(1)

After embedding in paraffin, the organs were sectioned Reduction of wheel running behaviour due to and H&E stained for microscopic assessment of inflam- DSS colitis and phlebotomy mation. The histological scoring was performed in a blinded manner using a previously published histo- To assess the change of VWR behaviour mice were logical score for DSS colitis14 which separately evalu- treated with 1% DSS via drinking water and addition- ates the distal and proximal colon. ally submitted to sampling procedures on day 0, 5 and 14 (1% DSS, Figure 2(a); Supplementary Figure 1(a)). Statistics Analysis of proportional change from baseline revealed a first drop in VWR on day 0, after phlebot- All statistical procedures were run on GraphPad Prism omy and faecal sample collection on this day. Changes 8 (v8.2.1, GraphPad Software, Inc., La Jolla, CA, in VWR under DSS treatment appeared from day 4 USA)Õ. If not indicated otherwise, all values show with a continuous decrease of wheel running until mean SD. For one animal of the 1% DSS group, day 7. After reaching a maximum reduction of 77%, the experiment had to be terminated on day 8 due to an increase of VWR was observed from day 8, exceeding high body weight loss (humane endpoint). The hypoth- baseline level on day 10. Mice submitted to phlebotomy esis of Gaussian distribution was tested by using the and faecal collection alone (0% DSS group; Figure 2(a); Shapiro–Wilk test. According to the results, compari- Supplementary Figure 1(b)) demonstrated a slightly son between all three groups was performed using one- decreased VWR behaviour on day 1 after sampling pro- way ANOVA with Tukey post hoc test in the case cedures, which sustained until day 4, and was followed of Gaussian distribution. In all other data sets, ana- by a further decrease after phlebotomy on day 5 with a lysis was performed via Kruskal–Wallis and Dunn’s reduction of VWR to 40% of baseline level. VWR post hoc test. A Mann-Whitney U-test was run on started to increase again from day 10, reaching baseline the non-parametrically distributed data of the histolo- VWR on day 13. In contrast, the animals in the control gical analysis (see also tabular results, Supplementary group (control; Figure 2(a); Supplementary Figure 1(c)), Material Tables 1 to 4 online). For cluster analysis, a which were solely weighed each day, demonstrated wheel previously developed cluster model was used (https:// running behaviour varying around baseline level over calliope.shinyapps.io/severity_assessment/). the whole observation time. Level of Vmax remained at baseline for all but one time point in the 1% DSS group, which coincided with Results the day of maximum drop in running activity and Adaptation to the running wheels and fre- showed a statistically significant reduction of 9% com- quencies of wheel occupation pared with the control group (Figure 2(b)). Both 0% DSS and control group displayed constant levels of To estimate the time seven group-housed female mice Vmax (Figure 2(b)). need to reach a stable running performance, individual Preferences of running wheels were displayed simi- VWR was monitored from 13:00 h to 08:00 h (VWR(19)) larly throughout all groups and showed no variation over 14 days. The mice started to run immediately after during the experiments (Figure 2(c)). getting access to the running wheels and demonstrated an average of 4000 revolutions (r) on the first day Change of clinical score and body weight (Figure 1(c)). Over the next five days the animals accomplished a stable running performance of approxi- Body weight course of 1% DSS-treated mice showed a mately 7000 r/day, which remained at this level until slight increase until day 3 followed by a continuous end of given adaptation time. Detected maximum run- reduction from day 4 to day 7, similar to the observed ning velocities (Vmax) showed a similar course, in which course of VWR (Figure 3(a)). After a maximum drop of mice accomplished a stable Vmax of 72 r/min on day – 10.4% on day 7, body weight restored baseline level on 6 (Figure 1(d)). day 9. Both 0% DSS and control group maintained a To analyse whether individual mice demonstrated slow overall increase of body weight above baseline spatial preferences for particular running wheels, fre- (Figure 3(a)). quencies of wheel occupation were assessed. The clinical score displayed highest values in the 1% Quantification of spatial preferences revealed that DSS group on day 4, which is attributable to most mice mice most readily used the peripherally located running developing bloody faeces on the fifth day of DSS wheels 1 and 6 with approximately 35% of total usage, administration. After discontinuation of DSS treatment respectively (Figure 1(b)). Running wheels in the centre on day 5, the score decreased because faecal samples (wheels 2–5) were used with frequencies of only 5–8% were soft but free of visible traces of blood of total usage. (Figure 3(b)). The animals did not show any further Weegh et al. 67

Figure 2. Wheel running parameters during experimental phase: comparison of voluntary wheel running (VWR) between all three groups ((a); 1% dextran sodium sulphate (DSS) n ¼ 6–7; 0% DSS n ¼ 7; control: n ¼ 7) showed significant dif- ferences on day 0 (first sample collection) as well as on days 5 to day 8 under and following DSS treatment (1% DSS vs. control: * ¼ p < 0.05, ** ¼ p < 0.01; 0% DSS vs. controls: ## ¼ p < 0.01; 1% DSS vs. 0% DSS: $ ¼ p < 0.05). Statistically significant impact of DSS treatment on maximum velocity (Vmax) (b) is seen on day 7 and day 8 while sample collection fails to have any effect (1% DSS vs. control: * ¼ p < 0.05). Distribution of running wheel (RW) preferences (c) shows no variation in comparison with the adaptation phase and is not affected by DSS treatment or sample collection. (For detailed results of statistical analyses see Supplementary Material Tables 1 and 2 online). bsl: baseline; d: day; arrows indicate sample collections. clinical signs of disturbed wellbeing. Neither animals of out of six mice. Additionally, alterations of crypt archi- the 0% DSS nor the control group exhibited any clin- tecture and minimal oedema were found in DSS ical signs of compromised welfare (Figure 3(b)). treated mice, while all respective non-treated mice depicted a physiological condition of the intestinal Histological analysis wall (Figure 3(c)).

Histological analysis for evaluation of inflammatory Application of the cluster model changes in the colon of mice treated with 1% DSS at the end of the experiment still reached a score of 18 out Representative transfer of day 7 data to the above- of 46 points maximal score, demonstrating ongoing mentioned cluster model5 revealed distinct differences inflammation without full recovery. Pathological between all three groups. The cluster model had been changes were dominated by a moderate infiltration of developed based on body weight and VWR training inflammatory cells into lamina propria, submucosa and data from another DSS colitis experiment and the util- lamina muscularis, also leading to peritonitis in three ization of a k-means algorithm, resulting in the 68 Laboratory Animals 54(1)

Clustering showed most animals of the 1% DSS group allocated to severity level 2, while 0% DSS trea- ted animals predominantly clustered into level 1 and control animals mainly fall into level 0 (Figure 4(a) to (c)). The percentage distribution into severity level 2 was 86% in the 1% DSS group, 28% in 0% DSS group and 14% in the control group (Figure 4(d)).

Discussion Monitoring of VWR behaviour has been successfully used to assess individual severity levels in single housed mice.5 The rationale of this study was to inves- tigate whether this approach is also applicable in group-housed mice. Therefore, the impairment of well- being caused by DSS colitis was evaluated by VWR behaviour, clinical scoring and histological analysis of the colon. DSS treated mice showed a significant reduc- tion in VWR and body weight and a significant eleva- tion of the histological score, but very few signs of disturbed wellbeing when considering clinical scoring. Mice merely submitted to facial vein phlebotomy and faecal sampling also showed decreased VWR compared with control mice, while other parameters remained unaffected. Reduction of VWR has already been proven to indicate disturbed wellbeing, for example, in a model of migraine in rats,16 post-surgical pain after partial hepatectomy17 and in models of DSS col- itis and restraint stress.5 However, based on VWR and body weight data, the discrimination of individual levels of experimental severity was possible by integrat- ing the data into a cluster model as described recently.5 Figure 3. Body weight, clinical score and histological Applying this model, DSS-treated mice were allocated results: change of body weight during the experiment to the, within this study, highest severity level on day 7. showed a statistically significant reduction in mice treated However, a considerable number of mice that were with 1% dextran sodium sulphate (DSS) (n ¼ 6–7) compared solely subjected to sampling procedures did allocate with all other groups (0% DSS: n ¼ 7; control: n ¼ 7) from to severity levels 1 and 2, also indicating a level of dis- day 4 onwards with a maximum drop of 10.4% on day 7; (1% turbed wellbeing. Therefore, the current study under- ¼ ¼ DSS vs. control: * p < 0.05, ** p < 0.01, lines and supports the applicability of a mathematical **** ¼ p < 0.0001; 1% DSS vs. 0% DSS: $ ¼ p < 0.05, tool for severity assessment. $$ ¼ p < 0.01, $$$$ ¼ p < 0.0001). Clinical signs (soft and/ or bloody faeces) were subtle and demonstrated only in Wheel running itself is an intensely studied, but not DSS treated animals ((b); * ¼ p < 0.05). Histological analysis yet fully understood, phenomenon, which has been (c) revealed a statistically significant increase of score in attributed to various sources of intrinsic and extrinsic the DSS treated animals compared with untreated mice motivation, including exploration behaviour, escape, (** ¼ p < 0.01). For detailed results of statistical analyses play and body weight maintenance.1 Furthermore, see Supplementary Material Tables 3 and 4 online. there are concerns about VWR being a pathological, bsl: baseline; arrows indicate sample collections. stereotypic behaviour because of its lack of purposeful function,18 even though it is a behaviour also exerted by wild mice.19 In this study, a stable use of running wheels was observed despite the vast opportunity for social definition of two borders which allocate data points to interaction and the large area provided, suggesting a one of three severity levels (levels 0, 1 and 2). These non-stereotypic behaviour due to other intrinsic or levels were suggested to be within the range of no to extrinsic physiological factors. In addition, none of moderate impact of experimental procedures on the the animals showed an over-excessive use of running overall wellbeing of mice. wheels. Weegh et al. 69

Figure 4. All data points of the experiment based on voluntary wheel running (VWR) and body weight data (a): allocation of individual mice of the 1% dextran sodium sulphate (DSS) group, the 0% DSS group and the control group on day 7 ((b); n ¼ 7 each) to severity levels 0, 1 and 2. (c) Percentage distribution in each group.

Generally, in light of mice being gregarious animals, it However, regarding behavioural tests of group- versus is recommended to avoid individual housing whenever individually-housed mice, results are diverse. Female possible.8,9 Indeed, many studies have compared individ- BALB/c mice housed without conspecifics have been ual versus group housing of rodents under laboratory reported to show higher levels of parameters related to conditions. Spa¨ni et al.20 observed a higher heart rate anxiety in contrast to C57BL/6 females.11 In addition, in singly-housed male mice compared with group- no effects of individual housing on behaviour were housed individuals, pointing to potential discomfort. observed in the modified hole board test and single 70 Laboratory Animals 54(1) housing did not lead to increased stress markers.10,11,21 Acknowledgment Furthermore, in a study investigating behaviour and AB and CH share senior authorship. recovery after minor laparotomy, no negative effects related to single housing were observed10.However, Declaration of Conflicting Interests the fact that some studies did not report any negative effects of single housing does not convincingly advocate The author(s) declared no potential conflicts of interest with for a recommendation of single housing. In a preference respect to the research, authorship, and/or publication of this article. study12 even subordinate male mice preferred compan- ionship when presented with the choice between an empty cage and a cage inhabited by another male Funding mouse. However, aggressive behaviour in male mice The author(s) disclosed receipt of the following financial support might require separation to prevent severe injury or for the research, authorship, and/or publication of this article: even death. As this can apply even to female mice of This work was supported by the Federal Ministry of Economics certain strains, genetic influences on social behaviour and Energy, Zentrales Innovationsprogramm Mittelstand should also be taken into consideration. (ZIM) (grant/award number: KF3465361TS4), Deutsche Companionship itself has a marked influence on Forschungsgemeinschaft (grant/award number: FOR 2591, behaviour in mice with regard to home cage activity22 ‘BL953/10-1’). and social interaction has a profound effect on certain parameters. For example, in mice subjected to restraint ORCID iDs stress, the presence of a conspecific diminishes the nega- Andre´Bleich https://orcid.org/0000-0002-3438-0254 tive impact of the treatment on working memory.23 Christine Ha¨ger https://orcid.org/0000-0002-6971-9780 Furthermore, significantly elevated stress hormone levels were detected in non-tested members of a group, References in which only some of the mice were submitted to behav- 1. Sherwin CM. Voluntary wheel running: A review and ioural testing.11 Empathic behaviour has also been novel interpretation. Anim Behav 1998; 56: 11–27. described in a three-chambered social approach test.24 2. Novak CM, Burghardt PR and Levine JA. The use of a Therefore, the condition of an individual might alter running wheel to measure activity in rodents: the behaviour of other group members and vice versa. Relationship to energy balance, general activity, and However, for VWR, single housing was required due reward. Neurosci Biobehav Rev 2012; 36: 1001–1014. to technical limitations to assess individual behaviour 3. Cobos EJ, Ghasemlou N, Araldi D, et al. Inflammation- within a group. Moreover, the effects of social inter- induced decrease in voluntary wheel running in mice: A action or the deprivation thereof on VWR have nonreflexive test for evaluating inflammatory pain and not yet been characterized in depth. Interestingly, analgesia. Pain 2012; 153: 876–884. in this study, group-housed mice showed about 50% 4. Sheahan TD, Siuda ER, Bruchas MR, et al. less VWR compared with single-housed mice Inflammation and nerve injury minimally affect mouse analysed in a previous study.5 This marked reduction voluntary behaviors proposed as indicators of pain. Neurobiol Pain 2017; 2: 1–12. in VWR related to social housing is also in accordance 5. Ha¨ger C, Keubler LM, Talbot SR, et al. Running in the with results from Sherwin,22 who found a marked wheel: Defining individual severity levels in mice. PLoS decrease in motivation of mice to access a running Biol 2018; 16: e2006159. wheel when housed in groups. In addition, Dewan 6. Russell WMS and Burch RL. The principles of humane 25 et al. observed a reluctance to run in wheels which experimental technique. Wheathampstead, UK: had previously been used by other individuals. Also, Universities Federation for Animal Welfare, 1959. behavioural testing of single-housed male B6 and DBA 7. Kappel S, Hawkins P and Mendl MT. To group or not to mice showed varying results, but a distinguishable effect group? Good practice for housing male laboratory mice. in terms of hyperactivity related to isolation.26 Thus, Animals (Basel) 2017; 7: E88. group-housing may avoid confounding VWR data by 8. EU. Directive 2010/63/EU of the European Parliament hyperactivity or stereotypic behaviour. Therefore, a and of the Council of 22 September 2010 on the protec- system that allows assessment of VWR on an individual tion of animals used for scientific purposes. Official Journal of the European Union 2010; L276/233–L276/279. level despite group-housing is highly desirable. 9. Olsson IAS and Westlund K. More than numbers matter: In summary, the present study proves VWR to be a The effect of social factors on behaviour and welfare of suitable indicator of disturbed wellbeing, also in group- laboratory rodents and non-human primates. Appl Anim housed mice. The implementation of RFID technology Behav Sci 2007; 103: 229–254. into an automated wheel running system enables indi- 10. Jirkof P, Cesarovic N, Rettich A, et al. Individual hous- vidual severity assessment without impairment of nat- ing of female mice: Influence on postsurgical behaviour ural social behaviour of the animals. and recovery. Lab Anim 2012; 46: 325–334. Weegh et al. 71

11. Arndt SS, Laarakker MC, van Lith HA, et al. Individual 19. Meijer JH and Robbers Y. Wheel running in the wild. housing of mice – impact on behaviour and stress Proc Biol Sci 2014; 281. responses. Physiol Behav 2009; 97: 385–393. 20. Spa¨ni D, Arras M, Konig B, et al. Higher heart rate of 12. Van Loo PL, van de Weerd HA, van Zutphen LF, et al. laboratory mice housed individually vs in pairs. Lab Anim Preference for social contact versus environmental enrich- 2003; 37: 54–62. ment in male laboratory mice. Lab Anim 2004; 38: 21. Kamakura R, Kovalainen M, Leppaluoto J, et al. The 178–188. effects of group and single housing and automated 13. Ma¨hler M, Berard M, Feinstein R, et al. FELASA rec- animal monitoring on urinary corticosterone levels in ommendations for the health monitoring of mouse, rat, male C57BL/6 mice. Physiol Rep 2016; 4: e12703. hamster, guinea pig and rabbit colonies in breeding and 22. Sherwin CM. Social context affects the motivation of experimental units. Lab Anim 2014; 48: 178–192. laboratory mice, Mus musculus, to gain access to 14. Ha¨ger C, Keubler LM, Biernot S, et al. Time to integrate resources. Anim Behav 2003; 66: 649–655. to nest test evaluation in a mouse DSS-colitis model. 23. Kim JW, Ko MJ, Gonzales EL, et al. Social support res- PLoS One 2015; 10: e0143824. cues acute stress-induced cognitive impairments by mod- 15. Moolenbeek C and Ruitenberg EJ. The ‘‘Swiss roll’’: A ulating ERK1/2 phosphorylation in adolescent mice. Sci simple technique for histological studies of the rodent Rep 2018; 8: 12003. intestine. Lab Anim 1981; 15: 57–59. (1). 24. Ueno H, Suemitsu S, Murakami S, et al. Empathic 16. Kandasamy R, Lee AT and Morgan MM. Depression of behavior according to the state of others in mice. Brain home cage wheel running: A reliable and clinically rele- Behav 2018; 8: e00986. vant method to assess migraine pain in rats. J Headache 25. Dewan I, Garland T, Hiramatsu L, et al. I Smell a Pain 2017; 18: 5. Mouse: Indirect Genetic Effects on Voluntary Wheel- 17. Tubbs JT, Kissling GE, Travlos GS, et al. Effects of Running Distance, Duration and Speed. Behavior buprenorphine, meloxicam, and flunixin meglumine as Genetics 2019; 49: 49–59. postoperative analgesia in mice. J Am Assoc Lab Anim 26. Vo˜ ikar V, Polus A, Vasar E, et al. Long-term individual Sci 2011; 50: 185–191. housing in C57BL/6J and DBA/2 mice: Assessment of 18. Mason G. Stereotypies: A critical review. Anim Behav behavioral consequences. Genes Brain Behav 2005; 4: 1991; 41: 1015–1037. 240–252.

Re´sume´ Le comportement d’activite´ volontaire en roue (AVR) est un indicateur sensible de bien-eˆtre perturbe´ qui est utilise´ pour e´valuer divers niveaux de gravite´ expe´rimentale chez la souris de laboratoire. Toutefois, la sur- veillance des performances d’AVR ne´cessite habituellement un he´bergement en cage individuelle, qui, lui- meˆme, pourrait avoir un effet ne´gatif sur le bien-eˆtre. Tenant compte du principe des 3R, le comportement d’AVR a e´te´ e´value´ dans le cadre des conditions de logement en groupe. Pour tester l’applicabilite´ au niveau de l’e´valuation de la gravite´, ces donne´es e´taient e´value´es dans un mode`le de souris pre´sentant une colite induite par Dextran Sulfate de Sodium (DSS). Un syste`me automatise´ inte´grant la technologie RFID (radio- frequency identification)e´tait utilise´ pour de´tecter l’AVR individuelle et permettre une surveillance continue. Apre`s une pe´riode d’adaptation de 14 jours, les souris ont de´montre´ une performance d’activite´ en roue stable. L’analyse effectue´e en cours de traitement au DSS en association avec une proce´dure de pre´le`ve- ments sanguins et fe´caux re´pe´te´s a permis de re´duire sensiblement le comportement d’AVR pendant la colite. L’AVR augmentait ensuite pendant le re´tablissement. Les souris soumises a` des pre´le`vements san- guins et fe´caux mais sans aucun traitement DSS affichaient moins d’AVR mais un re´tablissement plus dur- able. L’application d’un mode`le de regroupement diffe´renciant les niveaux individuels de gravite´ en fonction des donne´es d’AVR et du poids corporel a re´ve´le´ le plus haut niveau de gravite´ chez la plupart des souris traite´es au DSS au jour 7, mais un grand nombre de souris de controˆle pre´sentait e´galement des niveaux de gravite´ e´leve´s dus uniquement aux proce´dures de pre´le`vement. En re´sume´, la sensibilite´ de l’AVR e´tait associe´ea` la gravite´ de la colite DSS et a` l’impact des pre´le`vements d’e´chantillons. Par conse´quent, la surveillance de l’AVR est une me´thode approprie´e pour de´tecter les troubles du bien-eˆtre dus a` la proce´dure de pre´le`vement et la colite DSS chez les souris de laboratoire he´berge ´es en groupe.

Abstract Freiwilliges Laufradverhalten (VWR) ist ein empfindlicher Indikator fu¨r beeintra¨chtigtes Wohlbefinden und dient zur Beurteilung individueller Schweregrade bei Versuchen mit Laborma¨usen. Die U¨berwachung der individuellen VWR-Leistung erfordert jedoch in der Regel Einzelhaltung, die sich ihrerseits negativ auf das Wohlbefinden auswirken kann. Unter Beru¨cksichtigung des 3R-Prinzips wurde das VWR unter den 72 Laboratory Animals 54(1)

Bedingungen einer Gruppenhaltung bewertet. Um die Gu¨ltigkeit dieses Messwerts fu¨r die Schweregradbeurteilung zu testen, wurde dieser in einem Dextran-Natriumsulfat (DSS)-induzierten Kolitis- Modell ausgewertet. Fu¨r die kontinuierliche U¨berwachung wurde ein automatisiertes System mit integrierter RFID-Technologie (Radio Frequency Identification) eingesetzt, das die Erkennung einzelner VWR-Aktivita¨ten ermo¨glicht. Nach einer 14-ta¨gigen Anpassungszeit zeigten die Ma¨use eine stabile Laufleistung. Die Analyse wa¨hrend der DSS-Behandlung in Kombination mit wiederholter Phlebotomie und Kotprobennahme fu¨hrte zu deutlich reduziertem VWR im Verlauf der Kolitis und zu erho¨htem VWR wa¨hrend der Erholungsphase. Ma¨use, die einer Phlebotomie und Kotprobennahme, aber keiner DSS-Behandlung unterzogen wurden, zeigten weniger reduziertes VWR, dafu¨r aber eine la¨nger dauernde Erholungszeit. Die Anwendung eines Clustermodells, das individuelle Schweregrade auf der Grundlage von VWR- und Ko¨rpergewichtsdaten bestimmte, ergab den ho¨chsten Schweregrad bei den meisten der mit DSS behandelten Ma¨use am Tag 7, eine betra¨chtliche Anzahl von Kontrollma¨usen zeigte allerdings auch erho¨hte Schweregrade allein aufgrund der Probennahmeverfahren. Zusammenfassend la¨sst sich sagen, dass VWR den Verlauf der DSS-Kolitis-Belastung und die Auswirkungen der Probennahmen mit hoher Empfindlichkeit widerspiegelte. Daher ist VWR-Monitoring eine geeignete Methode zur Erkennung von gesto¨rtem Wohlbefinden nach DSS-Kolitis-Induktion und Probennahmeverfahren bei in Gruppen gehaltenen weiblichen Laborma¨usen.

Resumen El comportamiento del funcionamiento voluntario de la rueda (VWR, por sus siglas en ingle´s) es un indicador sensible del bienestar perturbado y se utiliza para la evaluacio´n de los niveles individuales de gravedad experimental en ratones de laboratorio. Sin embargo, el control del desempen˜o individual del VWR normal- mente requiere de un solo alojamiento, lo cual puede tener un efecto negativo en el bienestar. En considera- cio´n al principio de las 3R, el comportamiento VWR fue evaluado bajo condiciones de alojamiento grupal. Para poner a prueba la aplicabilidad de la evaluacio´n de la gravedad, este resultado se evaluo´ en un modelo de colitis inducida por dextrano sulfato de sodio (DSS). Para el control continuo, se utilizo´ un sistema automa- tizado con tecnologı´a RFID (identificacio´n por radiofrecuencia) integrada, que permite la deteccio´n de VWR individuales. Despue´s de un perı´odo de adaptacio´nde14dı´as, los ratones demostraron un rendimiento estable al correr. El ana´lisis durante el tratamiento con DSS en combinacio´n con flebotomı´a repetida y procedimiento de muestreo fecal dio como resultado una reduccio´n significativa del comportamiento del VWR durante el curso de la colitis y un aumento del VWR durante la recuperacio´n de la enfermedad Los ratones sometidos a flebotomı´ay muestreo fecal pero a ningu´n tratamiento con DSS mostraron una reduccio´n menor del VWR pero tambie´n una recuperacio´nma´s duradera. La aplicacio´n de un modelo de agrupaciones que diferencia los niveles de gravedad individual basado en los datos del VWR y peso corporal puso de manifiesto el nivel de gravedad ma´s alto en la mayorı´a de los ratones tratados con DSS el dı´a 7, aunque un nu´mero considerable de ratones de control tambie´n mostraron niveles de gravedad elevados derivados de los procedimientos de muestreo solamente. En resumen, el VWR indico´ de manera sensible el curso de la gravedad de la colitis por DSS y el impacto de la recogida de muestras. Por lo tanto, el control del VWR es un me´todo adecuado para la deteccio´n de bienestar perturbado debido a la colitis DSS y al procedimiento de muestreo en ratones de laboratorio hembra alojados en grupos. Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 73–82 ! The Author(s) 2019 A safe bet? Inter-laboratory variability Article reuse guidelines: sagepub.com/journals-permissions in behaviour-based severity assessment DOI: 10.1177/0023677219881481 journals.sagepub.com/home/lan

Paulin Jirkof1,2 , Ahmed Abdelrahman5, Andre´ Bleich3 , Mattea Durst1, Lydia Keubler3 , Heidrun Potschka4 , Birgitta Struve3, Steven R Talbot3 , Brigitte Vollmar5, Dietmar Zechner5 and Christine Ha¨ger3

Abstract Evidence-based severity assessment is essential as a basis for ethical evaluation in animal experimentation to ensure animal welfare, legal compliance and scientific quality. To fulfil these tasks scientists, animal care and veterinary personnel need assessment tools that provide species-relevant measurements of the animals’ physical and affective state. In a three-centre study inter-laboratory robustness of body weight monitoring, mouse grimace scale (MGS) and burrowing test were evaluated. The parameters were assessed in naı¨ve and tramadol treated female C57BL/6J mice. During tramadol treatment a body weight loss followed by an increase, when treatment was terminated, was observed in all laboratories. Tramadol treatment did not affect the MGS or burrowing performance. Results were qualitatively comparable between the laboratories, but quantitatively significantly different (inter-laboratory analysis). Burrowing behaviour seems to be highly sensitive to inter-laboratory differences in testing protocol. All locations obtained comparable information regarding the qualitative effect of tramadol treatment in C57BL/6J mice, however, datasets differed as a result of differences in test and housing conditions. In conclusion, our study confirms that results of behavioural testing can be affected by many factors and may differ between laboratories. Nevertheless, the evaluated parameters appeared relatively robust even when conditions were not harmonized extensively and present useful tools for severity assessment. However, analgesia-related side effects on parameters have to be considered carefully.

Keywords severity assessment, behaviour, multi-centre study, burrowing, mouse grimace scale, tramadol

Date received: 13 March 2019; accepted: 16 September 2019

Behavioural assessment parameters should ideally be 1Division of Surgical Research, University Hospital Zurich, valid, specific and sensitive. Especially if applied for Switzerland the statutory reporting of severity grades, as necessary 2Department of Animal Welfare and 3Rs, University of Zurich, in Switzerland and the EU, an intra- and inter-labora- Switzerland 3 tory robustness and comparability of assessment meth- Institute for Laboratory Animal Science and Central Animal Facility, Hannover Medical School, Germany ods is of utmost importance. Small differences between 4Institute of Pharmacology, Toxicology and Pharmacy, Ludwig- laboratories should not affect test read-outs strongly. Maximilians-University, Germany Otherwise, general statements on the severity of specific 5Rudolf-Zenker-Institute of Experimental Surgery, Rostock procedures will be hardly possible. Nevertheless, recent University Medical Center, Germany reports indicate problems with reproducibility and Corresponding author: replicability of behavioural tests. Several multi-centre Paulin Jirkof, University of Zurich, Winterthurerstr. 190 Zurich, ZH studies that assessed laboratory mouse behaviour 8057 Switzerland. using standardized test protocols and comparable test Email: [email protected] 74 Laboratory Animals 54(1) conditions conclude that, while several behaviours operating procedures (SOPs) for behavioural measure- seem to be robustly exhibited by mice in different labora- ments and drug delivery as well as the same sex tories, for example home cage activity,1 results of certain and strain of the animal subjects. All experiments behavioural tests differed distinctly between labora- were conducted between July and September 2018. tories.2–4 However, some authors conclude that even No other standardization measures were implemented. though the absolute test results varied between labora- The project design therefore resembled a real-life scen- tories, detected qualitative differences between treatment ario with several independent laboratories following groups were consistent in different laboratories. These published SOPs, rather than a systematic multi-centre authors concluded that behavioural testing remains approach with major efforts for prior standardization reliable if appropriate measures for standardization and harmonization. are applied and suitable controls are included.2 We hypothesized that results of all tested severity As underlying causes for the observed discrepancies in assessment measurements are comparable between behavioural read-outs between laboratories differences laboratories and that opioid analgesia affects in test conditions, for example, testing order, time of test results. test, type of test apparatus, testing room, handling meth- ods or experimenter sex, environmental conditions (e.g. Ethics statement season, lighting, humidity levels) or animal characteris- tics such as diet and microbiome or weaning age and All experiments were in accordance with the European litter size have been identified.1,3–6 Directive 2010/63/EU of the European Parliament In the presented study, we tested four simple indica- and of the Council on the Protection of Animals used tors, used increasingly in rodent severity assessment, for for Scientific Purposes. The studies were conducted in their robustness. Mouse grimace scale (MGS), burrow- accordance with the Swiss and German law for animal ing performance, water intake as well as body weight protection. progression were assessed in naı¨ ve and analgesia- treated mice in three different laboratories (Hannover Animals and methods (H), Rostock (RO), Zurich (Z)). The MGS is a popular and validated tool to assess pain in mice based on the A detailed description of animals and methods is given changes of facial expression.7 Based on the outcome of in the supplemental material. the initial validation, it is applied mainly in acute pain states such as the post-surgical phase. Burrowing is Animals a more complex, spontaneous behaviour that has been shown to be an easy to apply tool to assess Adult female C57BL/6J mice were used for all brain damage, neurodegenerative diseases or to monitor experiments. sickness but is also known to be affected in several stress- 8–10 ful or painful conditions in rats and mice. The bur- Standard housing conditions rowing test is based on the species-specific behaviour of burrow digging rodents to displace items from tube like During habituation, mice were housed in groups of structures.8 The monitoring of body weight and water four. Environmental conditions differed in the labora- intake changes is a classical tool to assess health status tories (see supplemental material). and overall wellbeing. Predefined reduction in weight is often used for the determination of humane endpoints Experimental protocol (i.e. criteria of experiment termination).11,12 Side effects of analgesia on behaviour and body weight After two weeks of habituation, baseline values of rodents have been described repeatedly.13 In the pre- (bsl) were assessed on days 1–2 (burrowing) or 1–3 sent study, we applied the opioid analgesic tramadol, (MGS, body weight). During these days 0.5% sucrose which is administered for the treatment of moderate to (Sucrose BioXtra, 99.5% (GC), Sigma-Aldrich severe pain, both acute and chronic, in various species, GmbH) enriched drinking water was provided. including humans and rodents.14 Even though tramadol Experimental values were assessed on the following has low bioavailability, plasma levels of tramadol and its days with 0.5% sucrose enriched drinking water con- active metabolite mono-O-desmethyltramadol (M1) are taining 1 mg/ml tramadol (Tramal drops 100 mg/ml, stable during oral self-consumption in mice.15,16 Gru¨ nenthal Pharma AG, Mitlo¨ di, Switzerland or ratio- Tramadol is, therefore, of interest for the use in stress- pharm GmbH, Ulm, Germany) given from start of day free, oral analgesia protocols. 3 until start of day 6 (for details see Table 1). For conducting the study all three laboratories The water-sucrose-tramadol mixture was replaced agreed on a common experimental schedule, common every 24 h. Jirkof et al. 75

Table 1. Experimental schedule after habituation. Mouse grimace scale (MGS), body weight, water intake and burrowing behaviour were assessed daily.

Day Start of light phase 2 h before dark phase Beginning of dark phase

1 Assessment MGS Start 0.5% sucrose Body weight Start burrowing test Read out burrowing 2 h 2 Assessment MGS Read out burrowing 12 h Body weight Start burrowing test Read out burrowing 2 h 0.5% sucrose 3 Assessment MGS Read out burrowing 12 h Body weight Start burrowing test Read out burrowing 2 h Start 0.5% sucrose þ tramadol 4 Assessment MGS Read out burrowing 12 h Body weight Start burrowing test Read out burrowing 2 h 0.5% sucrose þ tramadol 5 Assessment MGS Read out burrowing 12 h Body weight Start burrowing test Read out burrowing 2 h 0.5% sucrose þ tramadol 6 Assessment MGS Read out burrowing 12 h Body weight Start burrowing test Read out burrowing 2 h Start 0.5% sucrose 7 Assessment MGS Read out burrowing 12 h Body weight – –

Data acquisition Inter-laboratory comparisons. Inter-laboratory data were analysed by group-wise comparisons of measured Animals were housed in groups but separated overnight values. Each time frame was analysed using ANOVA for each burrowing test during baseline and experimen- and subsequent post-hoc tests. 95% confidence inter- tal measurements. vals (CIs) were plotted to show evidence-based differ- For the burrowing test each animal was provided ences in inter-laboratory findings. Further details on with a tube-like apparatus filled with pre-weighed mathematical proceedings are shown in the supplemen- food pellets (200 10 g) 2–3 h before the beginning of tal material. the dark phase. The filled tube was weighed after 2 h and at the end of the dark phase (12 h) to assess the amount of removed food pellets.8 Results Body weights were assessed daily with a precision Intra-laboratory results scale. Water intake was measured in group-housed as well as single-housed animals by weighing the drinking Body weight. In Z, a significant loss in body weight bottles on a precision scale. was observed after the start of tramadol administration For assessment of the MGS, mice were filmed in on day 4 (2.6%; p < 0.01) followed by an increase to bsl polycarbonate boxes. The mice were allowed to accli- level by day 6 (Figure 1(a)). In RO animals also demon- matize in the boxes for 2 min and then filmed for 5 (Z), strated significant weight reductions beginning from 12 (H) or 30 min (RO). The pictures were grabbed via day 4 (4.7%; p < 0.001), followed by an increase and screen shot by an automatic frame production and restoring bsl on day 7 (Figure 1(b)). In H, mice showed selection software, producing at least 5 (max. 8) clear a significant drop in body weight beginning from day 5 pictures per animal per time point and the acquired (2.3%; p < 0.01), which further decreased until day 6 frames were automatically randomized.17,18 Pictures (3.2%; p < 0.001) compared to bsl. After tramadol were scored according to the scoring scheme treatment was stopped, body weights reached bsl described by Langford, MGS means were calculated levels on day 7 (Figure 1(c)). per animal.7 MGS. MGS values did not differ significantly between Statistical analyses bsl and tramadol treatment in all three groups (Figure 2). Intra-laboratory results. Intra-laboratory data were analysed using a linear mixed-effects regression model Burrowing behaviour. Percentage of pellets removed with experimental regimes (days) as fixed effects to con- after 2 h was low and variable during baseline in all trol for repeated measures. General p values were three groups. During the adaption phase of mice in Z, obtained by ANOVA and post-hoc analysis. Further 2 h burrowing behaviour reached only 34% (Figure details on mathematical proceedings are shown in the 3(a)) and 95% during 12 h burrowing (Figure 3(b)). supplemental material. With beginning of tramadol treatment on day 3, 2 h 76 Laboratory Animals 54(1)

Figure 1. Percentage changes in individual body weights during tramadol administration compared to baseline (bsl) in (a) Z, (b) RO and (c) H. Grey background: sucrose in drinking water; white background: tramadol and sucrose in drinking water. Significance: p 0.05 (*), p 0.01 (**) and p 0.001 (***).

Figure 2. Mouse grimace scales during tramadol administration compared to baseline (bsl) in (a) Z, (b) RO and (c) H. Black dots represent outliers. Grey background: sucrose in drinking water; white background tramadol and sucrose in drinking water. No significance was found. burrowing behaviour significantly increased up to 80% Drinking water intake. The individual water intake per (p < 0.001) and remained increased compared to bsl hour (overnight) was 0.73 0.22 ml (Z), 0.4 0.05 ml until day 6 (63%) of the experiment. Burrowing behav- (H) and 0.81 0.08 ml (RO) at baseline (day 1–2) and iour on days 4 (p < 0.05) and 6 (0.01) was significantly 0.66 0.1 ml (Z), 0.4 ml 0.07 (H), and 0.68 0.06 ml lower than on day 3 (Figure 3(a)). Burrowing behav- (RO) during tramadol treatment (day 3–6). No statis- iour overnight (12 h) showed a significant reduction tical analyses were performed as bottle weights were on day 7 (82%) of the experiment when compared to affected by technical problems in some laboratories. bsl (p < 0.05), day 4 (p < 0.01) and day 6 (p < 0.05) (Figure 3(b)). In RO burrowing activity during the 2 h Inter-laboratory comparisons period was low (Figure 3(c)). The percentage of pellets removed overnight (12 h) did not exceed more than Percentage change in body weight was significantly 50%. Although the animals burrowed generally less, different between the laboratories from day 4 to 6 there was a significant decrease on days 4, 5, 6 and 7 (p < 0.05; p < 0.001) but not significantly different on (p < 0.001) when compared to bsl (Figure 3(d)). During day 7. Differences between individual laboratories bsl mice in H removed 31% in the 2 h period. Similar to were present when the CIs of the mean/median crossed the results from Z, there was a significant increase of 0, so that on day 4 Z to H was not different, on burrowing on day 3, 4, 5 and 6 (p < 0.05 to p < 0.01) day 5 RO to H and Z to H and on day 6 RO to H when compared to bsl (Figure 3(e)). The overnight (Figure 4(a)). Despite the fact that MGS was not dif- measurements showed 100% burrowed pellets with no ferent within the labs, medians were significantly differ- significant differences to baseline (Figure 3(f)). ent at bsl (p < 0.001) and on days 5 (p < 0.001) and Jirkof et al. 77

Figure 3. Percentage of pellets removed from burrowing apparatus during tramadol administration compared to base- line (bsl) after 2 h in (a) Z, (c) RO and (e) H and overnight (approx. 12 h) in (b) Z, (d) RO and (f) H. Grey background: sucrose in drinking water; white background tramadol and sucrose in drinking water. Significance: p 0.05 (*), p 0.01 (**) and p 0.001 (***).

6(p < 0.05) (Figure 4(b)). At all time points in compari- Discussion son of all laboratories 2 h and 12 h measurements of burrowing testing were significantly different Assessment methods used for the statutory reporting (p < 0.001). However, when comparing each laboratory of severity grades should ideally deliver robust and against the other, differences were present between comparable results in and between laboratories. Z and H on days 3 and 4 for the 2 h test (Figure 4(c)) Multi-centre approaches may provide robust validation and at all other time points for the 12 h test and evidence of reproducibility of behavioural meas- (Figure 4(d)). urements as, for example, shown for the use of 78 Laboratory Animals 54(1)

Figure 4. Inter-laboratory comparisons of the 95% CIs of (a) body weight (BWC), (b) median scores of the MGS, (c) burrowing 2 h and (d) burrowing 12 h. Grey: significantly different; white: not significantly different. bsl: baseline. Significance: p 0.05 (*), p 0.01 (**) and p 0.001 (***). burrowing behaviour as a pain indicator in rats.10 Here, treatment when planning experiments, designing score we present results of a three-centre study resembling a sheets and performing severity assessment. Nausea and real-life scenario with several independent laboratories emetic effects constitute well-known adverse effects of following published test SOPs, rather than a systematic tramadol in clinical use.20 The associated discomfort multi-centre approach with major efforts for prior and lack of appetite can result in indirect effects of tra- standardization and harmonization. madol administration on food intake and body-weight As one of the most frequently used and objective clin- development as also described for other opioids in ical parameters, body weights were monitored through- mice.13 While we cannot rule out that the repeated testing out the experiments showing decreases in all three of our mice induced a stress response and consequently laboratories when tramadol was administered. reduced body weight, the rapid recovery after termin- However, body weights recovered immediately in all ation of tramadol administration renders an adverse laboratories when tramadol was not administered any- effect of the analgesic drug more likely. Moreover, tra- more. Highest decrease in body weight of 4.7% was madol is of bitter taste and body-weight reduction can be observed in RO and lowest in Z with 2.6%. Given that a result of decreased water intake. Water intake was dif- percentage body weight decreases are important evalu- ficult to analyse due to technical problems in some labora- ation criteria in experimental score sheets and frequently tories (i.e. water loss during manipulation of the cages). used as termination criteria or humane endpoints,19 it is Nevertheless, the water intake of 4.8–8.16 ml per night, advisable to consider this adverse effect of analgesia suggests a therapeutic dosing of tramadol, that is sufficient Jirkof et al. 79 for an estimated serum concentration high enough to pro- pronounced deviations were determined for CIs of the vide pain relief.15 burrowing test. Analgesia is one of many experimental interventions However, when comparing only two out of three applied to laboratory mice and one should be aware of laboratories, there are also similarities between the its potential side effects. In addition to its analgesic datasets from Z and H and Z and RO regarding body potency, tramadol exerts antidepressant-like effects weight and MGS. Looking at the burrowing datasets that have been attributed to the effects on monoamine there were only similarities between Z and H. uptake. Respective effects have been reported based on It seems that some quantitative variation across studies evaluating the impact of tramadol on behavioural laboratories is detected in most multi-centre studies, patterns in different paradigms, for example, forced which has not necessarily to compromise overall differ- swimming test.21,22 Thus, it was of particular interest to ences and qualitative conclusions.2 Additionally, some study the impact of tramadol on behavioural parameters. behaviours seem to be less affected by environmental fac- The pain-indicating MGS was analysed in order to assess tors than others.1,24 Several authors highlight the impact a respective impact of tramadol on pain assessment par- of even minor changes in environmental factors on the ameters. In all laboratories MGS was not affected by outcome of animal-based experimental research.25,26 tramadol or by repeated testing procedures, which is Nevertheless, many sources of laboratory-related vari- comparable to the results of a study investigating bupre- ability remain unidentified and the relative impact of fac- norphine.23 In contrast, in another study, also using tra- tors is still unclear.5 Some authors,27,28 therefore, argue madol via drinking water after an initial injection of for more standardization in behavioural studies. In this tramadol, a slight increase of MGS was observed.16 context, it also needs to be considered that highly stan- Assessment of burrowing behaviour, another dardized experiments may represent ‘local truths’ with common pain-indicator, showed comparable results in little external validity.3 The negative implications of this Z and H but not RO. In naı¨ ve animals burrowing activ- ‘standardization fallacy’ problem have been intensely ity is expected to be high and test bottles are normally discussed.29 As we have not harmonized factors like empty or nearly empty after a 12 h test period as animal breeder, housing or handling between the three observed in Z and H, while animals that suffer from laboratories, we cannot conclude on the potential impact pain are expected to leave more material in the bottle of these factors on our results. after 12 h.8 Animals in RO showed surprisingly little In conclusion, our study confirms that results of burrowing behaviour throughout the first 2 h of the test- behavioural testing can be affected by many factors and ing and displayed significant lower performance over may differ in various laboratories. Nevertheless, the eval- night testing than in the other locations. A factor that uated parameter appeared relatively robust even when might contribute to this discrepancy is the burrowing not harmonized extensively. One can assume that when apparatus. Whereas Z and H used a bottle with a these tests are used for the evaluation of pain or stressful volume of 250 ml, RO used a bottle with a volume of experimental procedures effect sizes are increased, and 900 ml. In the underlying SOP the length of the bottle inter-laboratory differences become less prominent. and the diameter of the opening were fixed, but not Therefore, these tests present useful tools for severity volume or diameter of the bottle itself. Taking into assessment. Furthermore, analgesia-related side effects account, that the diameter of the burrowing apparatus on parameters have to be considered carefully. has been confirmed as a critical factor in previous stu- dies,8 the difference in test performance can probably be Acknowledgement attributed to this factor. Another confounding factor The authors would like to thank Margarete Arras for her arises from a low and variable burrowing behaviour valuable support. during the days intended for bsl acquisition demon- strated by mice in Zurich and Hannover. Mice were Declaration of Conflicting Interests not habituated to single housing prior to baseline meas- The author(s) declared no potential conflicts of interest with urements. Therefore, the adaption phase was probably respect to the research, authorship, and/or publication of this not long enough and single housing may have influenced article. burrowing behaviour. Adequate adaptation time, con- sidering all aspects of handling and housing, as well as Funding adequate assessment time points seems to be crucial The author(s) disclosed receipt of the following financial sup- when using the burrowing test. port for the research, authorship and/or publication of this Overall, mean differences of parameters were signifi- article: This study was supported by the Deutsche cantly different in the inter-laboratory comparison. Forschungsgemeinschaft (DFG research group FOR 2591, Whereas least differences were detectable for CIs in grant number: JI 276/1-1, ZE 712/1-1, VO 450/15-1 and BL body weight and median scores of MGS, the most 953/10-1). 80 Laboratory Animals 54(1)

ORCID iDs 15. Evangelista Vaz R, Draganov DI, Rapp C, et al. Paulin Jirkof https://orcid.org/0000-0002-7225-2325 Preliminary pharmacokinetics of tramadol hydrochloride Andre´ Bleich https://orcid.org/0000-0002-3438-0254 after administration via different routes in male and Lydia Keubler https://orcid.org/0000-0002-8738-9877 female B6 mice. Vet Anaesth Analg 2018; 45: 111–122. Heidrun Potschka https://orcid.org/0000-0003-1506-0252 16. Evangelista-Vaz R, Bergadano A, Arras M, et al. Steven R Talbot https://orcid.org/0000-0002-9062-4065 Analgesic efficacy of subcutaneous-oral dosage of trama- Dietmar Zechner https://orcid.org/0000-0002-2075-7540 dol after surgery in C57BL/6J mice. J Am Assoc Lab Christine Ha¨ ger https://orcid.org/0000-0002-6971-9780 Anim Sci 2018; 57: 368–375. 17. Ernst L, Kopaczka M, Schulz M, et al. Improvement of Supplemental Material the Mouse Grimace Scale set-up for implementing a semi- automated Mouse Grimace Scale scoring (Part 1). Lab Supplemental material for this article is available online. Anim 2020; 54: 83–91. 18. Ernst L, Kopaczka M, Schulz M, et al. Semi-automated References generation of pictures for the Mouse Grimace Scale: A 1. Robinson L, Spruijt B and Riedel G. Between and within multi-laboratory analysis (Part 2). Lab Anim 2020; 54: laboratory reliability of mouse behaviour recorded in 92–98. home-cage and open-field. J Neurosci Methods 2018; 19. Talbot SR, Bruch S, Kießling F, et al. Design of a joint 300: 10–19. research data platform: A use case for severity assess- 2. Lewejohann L, Reinhard C, Schrewe A, et al. ment. Lab Anim 2020; 54: 33–39. Environmental bias? Effects of housing conditions, 20. Goodman LS, Brunton LL and Knollmann BC. laboratory environment and experimenter on behavioral Goodman & Gilman’s: The pharmacological basis of thera- tests. Genes Brain Behav 2006; 5: 64–72. peutics. 12th ed. New York: McGraw-Hill, 2011. 3. Richter SH, Garner JP and Wurbel H. Environmental 21. Jesse CR, Wilhelm EA, Bortolatto CF, et al. Evidence for standardization: Cure or cause of poor reproducibility the involvement of the noradrenergic system, dopamin- in animal experiments? Nat Methods 2009; 6: 257–261. ergic and imidazoline receptors in the antidepressant-like 4. Wahlsten D, Metten P, Phillips TJ, et al. Different data effect of tramadol in mice. Pharmacol Biochem Behav from different labs: Lessons from studies of gene-envir- 2010; 95: 344–350. onment interaction. J Neurobiol 2003; 54: 283–311. 22. Yalcin I, Coubard S, Bodard S, et al. Effects of 5. Chesler EJ, Wilson SG, Lariviere WR, et al. Influences of 5,7-dihydroxytryptamine lesion of the dorsal raphe laboratory environment on behavior. Nat Neurosci 2002; nucleus on the antidepressant-like action of tramadol in 5: 1101–1102. the unpredictable chronic mild stress in mice. 6. Jørgensen BP, Hansen JT, Krych L, et al. A possible link Psychopharmacology 2008; 200: 497–507. between food and mood: Dietary impact on gut microbiota 23. Miller A, Kitson G, Skalkoyannis B, et al. The effect of and behavior in BALB/c mice. PloS one 2014; 9: e103398. isoflurane anaesthesia and buprenorphine on the mouse 7. Langford DJ, Bailey AL, Chanda ML, et al. Coding of grimace scale and behaviour in CBA and DBA/2 mice. facial expressions of pain in the laboratory mouse. Appl Anim Behav Sci 2015; 172: 58–62. Nat Methods 2010; 7: 447–449. 24. Richter SH, Garner JP, Zipser B, et al. Effect of popula- 8. Jirkof P. Burrowing and nest building behavior as indi- tion heterogenization on the reproducibility of mouse cators of well-being in mice. J Neurosci Methods 2014; behavior: A multi-laboratory study. PLoS One 2011; 6: 234: 139–146. e16461. 9. Jirkof P, Cesarovic N, Rettich A, et al. Burrowing behav- 25. Mogil JS. Laboratory environmental factors and pain ior as an indicator of post-laparotomy pain in mice. Front behavior: The relevance of unknown unknowns to Behav Neurosci 2010; 4: 165. reproducibility and translation. Lab Anim 2017; 46: 10. Wodarski R, Delaney A, Ultenius C, et al. Cross-centre 136–141. replication of suppressed burrowing behaviour as an etho- 26. Toth LA. The influence of the cage environment on logically relevant pain outcome measure in the rat: A pro- rodent physiology and behavior: Implications for repro- spective multicentre study. Pain 2016; 157: 2350–2365. ducibility of pre-clinical rodent research. Exp Neurol 11. Hawkins P. Recognizing and assessing pain, suffering and distress in laboratory animals: A survey of current 2015; 270: 72–77. practice in the UK with recommendations. Lab Anim 27. Van der Staay F and Steckler T. The fallacy of behavioral 2002; 36: 378–395. phenotyping without standardisation. Genes Brain Behav 12. Morton DB. A scheme for the recognition and assess- 2002; 1: 9–13. ment of adverse effects in animals. Dev an Vet 1997; 27: 28. Wahlsten D. Standardizing tests of mouse behavior: 235–240. Reasons, recommendations, and reality. Physiol Behav 13. Jirkof P. Side effects of pain and analgesia in animal 2001; 73: 695–704. experimentation. Lab Anim 2017; 46: 123–128. 29. Wu¨ rbel H. Behavioral phenotyping enhanced–beyond 14. Lewis KS and Han NH. Tramadol: A new centrally acting (environmental) standardization. Genes Brain Behav analgesic. Am J Health Syst Pharm 1997; 54: 643–652. 2002; 1: 3–8. Jirkof et al. 81

Re´sume´ L’e´valuation de la gravite´ fonde´e sur des preuves est indispensable en tant que base pour une e´valuation e´thique de l’expe´rimentation animale afin d’assurer le bien-eˆtre des animaux, la conformite´ le´gale et la qualite´ scientifique. Pour remplir ces taˆches scientifiques, le personnel de soin des animaux et les ve´te´r- inaires ont besoin d’outils d’e´valuation qui fournissent des mesures pertinentes a` l’espe`ce quant a` l’e´tat affectif et physique des animaux. Une e´tude inter-laboratoires mene´e dans trois centres sur la robustesse de la surveillance ponde´rale, l’e´chelle de la grimace de la souris (MSG) et le test d’enfouissement ont e´te´ e´value´s. Les parame`tres ont e´te´ e´value´s chez des souris C57BL/6J femelles naı¨ves de traitement et traite´es au tramadol. Au cours d’un traitement au tramadol, une perte de poids suivie d’une augmentation lorsque le traitement a e´te´ interrompu a e´te´ observe´e dans tous les laboratoires. Le tramadol n’a pas affecte´ le score MSG ni le comportement d’enfouissement. Les re´sultats se sont ave´re´s qualitativement comparables entre les laboratoires, mais quantitativement significativement diffe´rents (analyse inter-laboratoires). Le compor- tement d’enfouissement semble eˆtre tre`s sensible aux diffe´rences inter-laboratoires dans le protocole d’essai. Tous les sites obtenaient des informations comparables sur l’effet qualitatif du traitement au tra- madol des souris C57BL/6J, mais les ensembles de donne´es diffe´raient en raison des diffe´rences lie´es aux conditions des tests et de logement. En conclusion, notre e´tude confirme que les re´sultats des tests de comportement peuvent eˆtre affecte´s par de nombreux facteurs et peuvent diffe´rer d’un laboratoire a` l’autre. Ne´anmoins, les parame`tres e´value´sse sont ave´re´s relativement robustes meˆme lorsque les conditions n’e´taient pas largement harmonise´es et constituent des outils utiles pour l’e´valuation de la gravite´. Cependant, les effets secondaires lie´sa` l’analge´sie sur les parame`tres doivent eˆtre conside´re´s avec soin.

Abstract Die evidenzbasierte Schweregradbewertung ist als Grundlage fu¨r die ethische Bewertung von Tierversuchen unerla¨sslich, um den Tierschutz, die Einhaltung der Rechtsvorschriften und die wissenschaftliche Qualita¨tzu gewa¨hrleisten. Zur Erfu¨llung dieser Aufgaben beno¨tigen Wissenschaftler sowie Tierpflege- und Veterina¨rpersonal Bewertungsinstrumente, die artenrelevante Messungen des physischen und affektiven Zustands der Tiere ermo¨glichen. In einer Drei-Zentren-Studie wurden die laboru¨bergreifende Robustheit von Ko¨rpergewichtsmessung, Mouse Grimace Scale (MGS) und Wu¨hl-/Grabungsverhaltenstest ausgewertet. Die Parameter wurden bei unbehandelten und mit Tramadol behandelten weiblichen C57BL/6J-Ma¨usen untersucht. Wa¨hrend der Tramadolbehandlung wurde in allen Labors ein Ko¨rpergewichtsverlust beobachtet, dem nach Behandlungsende eine Zunahme folgte. Die Behandlung mit Tramadol hatte keinen Einfluss auf MGS- oder Grabungsverhalten. Die Ergebnisse waren zwischen den Labors qualitativ vergleichbar, aber quantitativ signifikant unterschiedlich (Interlaboranalyse). Das Grabverhalten schien empfindlich von im Testprotokoll bestehenden Unterschieden zwischen den Labors abzuha¨ngen. Alle Standorte erzielten vergle- ichbare Informationen u¨ber die qualitative Wirkung der Tramadolbehandlung bei C57BL/6J-Ma¨usen, jedoch variierten die Datensa¨tze aufgrund unterschiedlicher Test- und Haltungsbedingungen. Zusammenfassend besta¨tigt unsere Studie, dass Ergebnisse von Verhaltenstests von vielen Faktoren beeinflusst werden und zwischen einzelnen Labors variieren ko¨nnen. Dennoch erwiesen sich die ausgewer- teten Parameter auch unter nicht weitgehend harmonisierten Bedingungen als relativ robust und ko¨nnen somit als nu¨tzliche Methoden zur Schweregradbewertung dienen. Die mit der Analgesie verbundenen Auswirkungen auf die Parameter mu¨ssen allerdings sorgfa¨ltig gepru¨ft werden.

Resumen La evaluacio´n de gravedad en base a la evidencia es esencial como base para una evaluacio´ne´tica durante la experimentacio´n con animales a fin de garantizar el bienestar, el cumplimiento legal y la calidad cientı´fica. Para satisfacer estos puntos, los cientı´ficos, los cuidadores de animales y el personal veterinario necesitan evaluar herramientas que ofrezcan mediciones relevantes segu´n la especie en cuanto al estado afectivo y fı´sico de los animales. En un control del peso corporal de la solidez entre laboratorios de un estudio realizado en tres centros, se evaluo´ la escala de muecas de ratones (MGS) y la prueba de excavacio´n. Se evaluaron los para´metros en ratones hembra ingenuas C57BL/6J tratadas con tramadol. Durante el tratamiento con 82 Laboratory Animals 54(1) tramadol, se observo´ una pe´rdida de peso corporal tras un aumento, al finalizar el tratamiento. El tratamiento con tramadol no afecto´ a la MGS o a las tareas de excavacio´n. Los resultados fueron comparables cualita- tivamente entre los laboratorios, pero fueron cuantitativamente muy diferentes (ana´lisis entre laboratorios).El comportamiento de excavacio´n parece ser ma´s sensible a las diferencias entre laboratorios en el protocolo de pruebas. Todas las ubicaciones obtuvieron informacio´n comparable sobre el efecto cua- litativo del tratamiento con tramadol en ratones C57BL/6J, sin embargo, las fichas te´cnicas difirieron debido a diferencias en las condiciones de las pruebas y las jaulas. En conclusio´n, nuestro estudio confirma que los resultados de las pruebas de comportamiento pueden verse afectados por muchos factores y muchos tienen presentan diferencias entre los distintos laboratorios. De todos modos, los para´metros evaluados parecen relativamente so´lidos incluso cuando las condiciones no habı´an sido armonizadas exhaustivamente y representan herramientas u´tiles para la evaluacio´n de gravedad. No obstante, la analgesia relacionada con los efectos secundarios sobre los para´metros tiene que consider- arse cuidadosamente. Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 83–91 ! The Author(s) 2019 Improvement of the Mouse Grimace Scale Article reuse guidelines: sagepub.com/journals-permissions set-up for implementing a semi-automated DOI: 10.1177/0023677219881655 journals.sagepub.com/home/lan Mouse Grimace Scale scoring (Part 1)

Lisa Ernst1, Marcin Kopaczka2, Mareike Schulz1, Steven R Talbot3 , L Zieglowski1, M Meyer1, S Bruch1 , Dorit Merhof2 and Rene H Tolba1

Abstract The Mouse Grimace Scale (MGS) has been widely used for the noninvasive examination of distress/pain in mice. The aim of this study was to further improve its performance to generate repeatable, faster, blinded and reliable results for developing automated and standardized pictures for MGS scoring and simultaneous evaluation of up to four animals. Videos of seven C57BL/6N mice were generated in an experiment to assess pain and stress induced by repeated intraperitoneal injection of carbon tetrachloride (CCl4). MGS scores were taken 1 h before and after the injection. Videotaping was performed for 10 min in special observation boxes. For manual selection, pictures of each mouse were randomly chosen for quality analysis and scored according six quality selection criteria (0 ¼ no, 1 ¼ moderate, 2 ¼ full accordance); the maximum possible score was 12. Overall, 609 pictures from six videos were evaluated for MGS scoring quality; evaluation was performed by using the picture selection tool or by manual scoring. With manual scoring, 288 pictures (48.3% of all randomly generated pictures) were deemed scorable using MGS (mean score ¼ 22.15 SD 6.3). To evaluate the algorithm, ratings from different rater groups (beginner, medium-level trained, professional) were com- pared with the automated image generated. These differences were not significant (p ¼ 0.1091). This study demonstrates an improved set-up and a picture selection tool that can generate repeatable, not-observer biased and standardized pictures for MGS scoring.

Keywords behaviour, distress, ethics and welfare, MGS, pain measurement, severity

Date received: 27 May 2019; accepted: 16 September 2019

The Mouse Grimace Scale (MGS) was invented by Langford et al. in 2010 as a behavioural test for exam- ination of distress/pain.1 This method has been widely 1Institute for Laboratory Animal Science and Experimental used for the noninvasive behavioural examination of Surgery, RWTH Aachen University, Germany 2Institute of Imaging & Computer Vision, RWTH Aachen University, distress/pain in mice and many other species, such as 2 3 4 5 6 Germany horses, pigs, lambs, rabbits and rats. 3Institute for Laboratory Animal Science and Central Animal 1 In the original MGS setup by Langford et al. pain Facility, Hannover, Germany and distress are measured by coding five different facial expressions. Corresponding author: Rene H. Tolba, Institute for Laboratory Animal Science & For evaluation of the MGS, mice are habituated to Experimental Surgery, University Hospital RWTH Aachen, observation cages for up to 45 min and then they are Pauwelsstraße 30, 52074 Aachen, Germany. filmed in the cages for 30 min. Afterward, these pictures Email: [email protected] 84 Laboratory Animals 54(1) from the video are selected (cropped and sorted) and performed twice a day for three days a week, at exactly scored manually.1 Other MGS generation methods the same time in the morning.12 recently demonstrated by different workgroups include 7 real-time live scoring by a human examiner or manual Technical setup capture of individual pictures within custom photo boxes.8 Because of the lack of automation, these differ- To generate pictures for MGS scoring the animals were ent methods are time-consuming, labour-intensive and filmed with a digital single-lens reflex camera (Canon difficult to standardize. To reduce this workload, EOS 700D, Canon Deutschland GmbH, Krefeld, Matsymia et al. presented the Rodent Face FinderÕ Germany) for 10 min in special observation boxes in their study of the extraction of pictures of rodents 9 5 5cm3, as described in the protocol used by with pale fur.9 Langford et al.1 Information about the construction The aim of our study was to develop and further of MGS observation boxes can be found in the supple- improve the performance of MGS to generate repeat- mental material. For simultaneous observation of four able, faster, blinded, not-observer-biased, standardized animals, a special rack was prepared for application of and randomized pictures and reliable results in a one- the boxes (Fig. 1) and was placed into a white filming step approach. The concept is also recommended for tent (Fotozelt Lichtzelt Wu¨rfel 80 80 80 cm3; identifying mice with dark fur. Furthermore, up to four Softbox, PMS, Germany) which had its sides, front animals can be filmed and evaluated simultaneously and bottom illuminated from the outside. The bright- using this method. ness within the observation boxes ranged between 250 The intention of the automation of the technical and 430 lux, depending on the location. processes is to minimize subjective intervention as Before the examination started, one of these boxes well as avoid selection and performance bias. was placed into each animal’s home cage for 5 min to acclimatize the animals for odour adaptation and to minimize the stress of a new surrounding. To reduce Material and methods the odour, the boxes were cleaned between use and at Ethical statement the end of the day they were disinfected with Antifect N liquid solution (Schu¨lke & Mayr, Norderstedt, This animal study was performed in accordance Germany). with the Federal German law regarding the protection of animals. The study proposal was approved by the Image collection governmental animal care and use committee (LANUV, North Rhine-Westphalia, Germany, AZ: To establish the new generation and evaluation tool 84-02.04.2014.A417) and was in compliance with insti- and to the compare manual and automated image selec- tution guidelines as well as The Guide for the Use of tion, images from the videos were generated manually Laboratory Animals.10,11 and by the picture selection in this experiment. Seven animals out of six randomly chosen videos were subset Study design for generating pictures for MGS. For manual image generation, the VLC Media The evaluation of MGS as an integral part of a pre- Player (32-bit Version 3.0.4) was used on a Windows vious unrelated animal-based study on severity assess- 7 operating system to capture a picture whenever the ment of CCl4 injection for the induction of liver fibrosis animal looked into the camera. The pictures were in mice. Seven male C57BL/6N mice (Mus musculus) named by date and the position of the animal in weighing an average of 25 2 g were randomly the set-up to guarantee clear assignment. chosen from a group of 72 animals for video recording. Six quality criteria were used to manually select the To avoid any hormonal differences, only male animals pictures: (1) the mouse should appear in profile or front were included. Information on housing conditions and view; (2) at least one eye should be visible; (3) nose and hygiene management can be found in the supplemental cheek bulges should be visible; (4) the animal should be material. static and calm (not in a grooming or in sniffing pos- ition); (5) at least one ear should be recognizable; and Experimental procedure (6) the overall good image quality should be good (the animal should be in focus, with no or low reflection, no To assess pain and stress CCl4 was intraperitoneally cloudiness and good illumination) (Fig. 2). All pictures injected to induce liver fibrosis. To comply with the were scored according to whether they met each criter- 3R principle, no additional experiments were con- ion: 0 ¼ no accordance with criteria, 1 ¼ little accord- ducted to evaluate MGS. The MGS videotaping was ance with criteria, 2 ¼ full accordance with criteria. Ernst et al. 85

The maximum possible score for the picture quality was The automated quality scoring was implemented 12 points. As a cut-off value for quality selection, all using a pipeline of semantic segmentation networks images selected as scorable for MGS had to have a used to detect relevant facial areas, which were subse- score of 7 points, with no criterion scored 0. quently scored by deep convolutional networks for For automated image generation, the algorithm was image classification. In the segmentation stage, a net- ‘trained’ with approximately 700 images, which were work for semantic segmentation was ‘trained’ to segment evaluated by three different human raters according the animals’ ears in the images. Subsequently, cropped on the basis of the same criteria as described above. to 220 220 pixels that focused on the centroid of the animals’ ears were forwarded to a set of classification networks that were trained to score each criterion defined previously. As with manual quality scoring, the final quality score of an image was determined by the summation of all individual scores and comparison of the sum to the threshold of 7. To show the difference between the MGS selection tool and manual selection method, 50 pictures (mean, 47 5.8) were randomly cut out per animal video. Afterwards, the generated images were filed with an animal identifier and time in an Excel sheet (Microsoft Excel Version 2016 for Windows) and were evaluated randomly and subsequently blinded in a scoring tool.

Evaluation process To evaluate interrater reliability between image selection Figure 1. The rack is designed in a 2 2 row system with by the algorithm and manual selection, nine human a height width depth of 34 11 12 cm3 and is made of raters (three beginners, three with medium-level training, transparent polycarbonate. three professionals) evaluated 116 images of three

Figure 2. Examples of images excluded for not fulfilling selection criteria. Pictures were scored with a maximum of three for each of the six criteria and were excluded when one criterion was scored as 0 or with an overall score <7 points. 86 Laboratory Animals 54(1) animals generated from two different videos. The videos used for this evaluation were different from those videos selected to ‘train’ the algorithm to assess the transferabil- ity of the algorithm to new data. The beginners were people who had no experience in handling or evaluating animals, or those with 2 years of experience in animals. The raters with medium-level training had 2 years but 5 years of experience in animal science and no to little experience with evaluating MGS images. The profes- sionals had 5 years of experience in animal science, had good to very good knowledge of handling and stress assessment of animals, and were highly familiar with the MGS scoring system. All groups simultaneously performed the scoring after a 30-minute briefing of the test setup. Figure 3. The image quality for each position (Pos.) of each Mouse Grimace Scale (MGS) box in the rack. There Statistical analysis was no significant difference among all four positions in picture quality. One-way ANOVA, F (3, 15) ¼ 0.1847. Data were analysed with GraphPad PRISM 7 software (GraphPad Software Version 7, La Jolla, CA, USA, www.graphpad.com) and R (R Foundation for An ICC describes a gradation of <0.4, >0.4 but Statistical Computing, Vienna).13 To calculate the <0.59, >0.6 but <0.74, and >0.75 implies, bad, fair, interclass correlation coefficient (ICC), the ICC func- good and excellent reliability, respectively.15 tion from the interrater reliability (irr) library14 was There was no significant difference between the dis- used. The model was built as a two-way setup with tribution of the quality scoring points for the images random column and row effects. The type was set to and the different training levels (Fig. 4). agreement between raters which yielded an ICC value with corresponding F-values and 95% confidence inter- Analysis between irr and the algorithm vals for the model. A Kruskal–Wallis test was used to compare responses of the raters at different training On comparing the scores produced by the algorithm levels. Results were assumed to be statistically signifi- and those by all nine raters, the average overall agree- cant when p < 0.05. To calculate of reliability between ment (identical selected pictures) was 74.14 1.86%. two different rater groups Fisher’s Exact test was used. Data were analysed with Cohen’s j (Kappa) (Fig. 5).16 The values for j were calculated as follows for the comparing the scores produced by the algorithm Results and the scoring groups: on comparing the algorithm MGS set-up with the beginner group, j ¼ 0.427, on comparing with the group with medium-level training j ¼ 0.362, In total, 609 pictures were selected random and scored and on comparing with the professional group, for quality. Of these, 288 pictures fulfilled the described j ¼ 0.449. Cohen’s j values can assume values between quality criteria for MGS scoring. For each animal, at –1 and 1. In this study, Cohen’s j was considered as least nine and up to 31 pictures, with a mean of 22 moderate when it was >0.410. Fisher’s exact test was pictures/video (SD ¼ 6.3) were selected as scorable. used to calculate the specificity and sensitivity of the They represented an overall of 48.3% of the randomly algorithm against a professional rater; this test yielded generated pictures. Of the pictures in positions 1, 2, 3 a specificity of 0.84 and a sensitivity of 0.42 (data not and 4 in the rack (Fig. 1), 50.8, 48.1, 45.1 and 41.5% shown). were scorable, respectively. There was no significant difference between the ability of the image to be Discussion scored and the position in the MGS setting (Fig. 3). Although the MGS is used often and is a recognized Irr analysis method for the classification of distress/pain, the imple- mentation of MGS in different working groups varies. The ICC levels of the different rater groups were not According to different studies in which MGS was significantly different and all results of the rater groups implemented, there are multiple variations in the use were within an excellent range of agreement (Table 1). and structure of the observation boxes, length of the Ernst et al. 87

Table 1. Calculation of the Inter Class Correlation Value (ICC) between different rater groups.

Model: 2-Way Fl v. F2 Fl v. F3 F2 v. F3 Fl–F3

Type: Agreement Subjects 600 600 600 600 Raters 2 2 2 3 ICC (2) 0.948 0.906 0.948 0.956 F-Test, HO: r0 ¼ 0; H1: r0 > 0 F (599, 600) ¼ 19.2 F (599, 600) ¼ 10.7 F (599, 600) ¼ 19.1 F (599, 1200) ¼ 22.5 p value 5.17e–220 1.25e–153 2.29e–219 0 95% confidence interval 0.939 < ICC < 0.956 0.89 < ICC < 0.92 0.939 < ICC < 0.955 0.949 < ICC < 0.961 for ICC Population Values

F1: beginner; F2: trained; F3: professional.

manual evaluation, whereby the output images are then randomly inserted into a PowerPoint macro.6 Matsymia et al.9 and Sotocinal et al.6 used the Rodent Face FinderÕ to identify white animals, which were recorded individu- ally. Because the ‘C57BL/6N (‘‘B6’’) mice are one of the oldest and most widely used inbred strains in biomedical research’,18 our aim was to implement image selection of animals with dark fur. To reduce labour intensity as well as the stress level on the animal, we decreased the filming time from 30 to 10 min.1 In addition, the animals were acclimatized by placement of the MGS box for 5 min into their home cage before the recordings were started. However, we decided against reducing the filming time further to <10 min; instead, we opted for a simultaneous rec- Figure 4. Comparison of the distribution quality of scoring ording of up to four animals. This decision was made to points for pictures with regard to rater training levels. The maintain a better overview of the behaviour of the ani- Kruskal–Wallis test with 1 degree of freedom yielded a mals during the recordings, and it was based on the fact value of 1.7758; p ¼ 0.1827. that a recording time of <10 min is rarely reported.17 Red foil on all long sides of the observation boxes film recordings and the timing of the image selection. prevented a view of the other animals. We could not For example, some raters use 80 80 80 mm3 photo- visually detect any influence of the animals on each graphic cubes and others film the animals inside the other with regard to altered or increased social inter- box used as photographic cuboid with dimensions of action, such as rearing, leaning toward the other box, 22 29 39 cm3.8 or increased sniffing behaviour, which was described by In several studies, the recording time ranged from Kim et. al. as a response stimulus to other animals.19 10 min to over 30 min and the selection time or criteria Moreover, an observation time of more than 12 min sub- were often random or not specified.7–9 jectively resulted in a visible increased respiration rate These variations in behaviour tests can have an effect which was manifested by fogging on the front side of on the comparability of results and, therefore, more ani- the observation boxes. A long observation period may, mals are needed to validate the results.17 This shows the on one hand, result in increased exploratory movement need for standardization of these tests. In the various of the animals in their boxes, with the consequence of studies, the minimum recording time that the animals less accurate pictures possible from the session. On the were filmed individually ranged between 3 and other hand, a too short observation period could 10 min.7,17 In larger studies with more animals, the increase the likelihood of picking random behaviours. amount of filming time was multiplied by filming each A filming duration of 10 min was enough time to habitu- individual animal, thus, the processing time would be rad- ate the animals to the boxes with no observed increase in ically increased with regard to manual cutting and scoring visible stress behaviour and to simultaneously obtain of images. In a similar manner, the Rodent Face FinderÕ good pictures. In contrast to our 10-min filming protocol extracts the images from 30 min videos for use in a and automated image selection, manual live scoring was 88 Laboratory Animals 54(1)

Figure 5. Comparison of agreement between the algorithm and different training levels as well as the overall agreement. Data were analysed using Cohen’s j test. described by Miller and Leach.7 In their study, the exam- therefore can significantly reduce the labour and per- iner scored the animal for 5 s during a 10-min observa- sonal working time. The study by Matsumiya et al. also tion period while sitting in front of the cage. In our view, showed that automatic generation of images can lead to personal observation for the live examination seems a reduction in selection bias and also to a significant accurate but has two drawbacks: choosing an independ- decrease in labour intensity.9 ent moment in a long period of 10 min increases the risk We are aware that only male animals used in a pre- of bias and this method is highly observer-dependent vious study were used to train the algorithm. The use of and therefore highly subjective as well as time consum- randomly selected animals from another study is in ing. Because our 10-min recordings yielded several accordance with the 3R principle. However, intensive images, the random image selection, as opposed to research has not revealed differences in grimace expres- manual selection, is independent of observer choice. sion between female and male mice. We therefore con- Our results have demonstrated that almost 50% of the clude that the algorithm we developed may not be images randomly created with our set-up could be used influenced by the sex of the animal. for evaluation without further selection. This shows that the standardized set-up per se is a good basis for select- Conclusion ing the images. We were able to determine that visual blockage by a red transparent foil minimizes the mutual Through automated image selection under specifically influence of animals. In addition, the foil allowed a defined criteria with this set-up, images were selected better contrast of the animals’ dark coat against the independent of the personal assessment of the observer, white background, but it still let enough light pass to and thus, selection bias was reduced. Nevertheless, brighten the MGS box. It is therefore possible to ameli- pictures can be selected according to quality, but orate the assessment of MGS criteria. When selecting the extraction from the video is random. The disadvantage images automatically, we were able to show good agree- is that deviations in behaviour in the video recordings ment between the algorithm and manual image selection cannot be identified. According to Amy L. Miller, this (Fig. 5). is a disadvantage of MGS scoring through image cap- To select the images with our protocol and the tool, ture. It cannot be ruled out that the animal accidentally an average manual processing time of 1.35 s per picture blinked or showed exploratory behaviour in the picture was required (data not shown). This corresponds to because it is a momentary picture.7 almost 21 times less time that a person would spend Our set-up takescare of this problem as images in manually (28.2 s per picture) selecting the images and which the animals exhibited random movement Ernst et al. 89 behaviour, such as grooming or sniffing, are not selected. facial expressions following castration and tail At the same time, the output can be varied as large as docking: A pilot study. Front Vet Sci 2017; 4: 51. desired that the weighting of random momentary images 4. Guesgen MJ, Beausoleil NJ, Leach M, et al. Coding and is minimized. Although these measures can minimize quantification of a facial expression for pain in lambs. incorrect selection, they cannot exclude it completely. Behav Process 2016; 132: 49–56. 5. Hampshire V and Robertson S. Using the facial grimace Another disadvantage of the MGS analysis is the scale to evaluate rabbit wellness in post-procedural moni- strong manual inclusion in image evaluation. So far, toring. Lab Anim 2015; 44: 259–260. even images selected by the algorithm still have to be 6. Sotocinal SG, Sorge RE, Zaloum A, et al. The Rat evaluated manually. They can be displayed and evalu- Grimace Scale: A partially automated method for quan- ated randomly and blinded with the output of the tool, tifying pain in the laboratory rat via facial expressions. but an automatic evaluation is not yet possible for Mol Pain 2011; 7: 55. all MGS criteria. Although, if the set-up that we used 7. Miller AL and Leach MC. The Mouse Grimace Scale: minimized time spent selecting images and four animals A clinically useful tool? PLoS One 2015; 10: e0136000. could be observed simultaneously, the total time 8. Hohlbaum K, Bert B, Dietze S, et al. Severity classifica- required for processing is adequate for long-term evalu- tion of repeated isoflurane anesthesia in C57BL/6JRj ation of severity, but it is not suitable for an acute deci- mice: Assessing the degree of distress. PloS One 2017; sion about the animal’s condition due to the time delay.7 12: e0179588. 9. Matsumiya LC, Sorge RE, Sotocinal SG, et al. Using the In the future, the work to be conducted should serve Mouse Grimace Scale to reevaluate the efficacy of post- to generate MGS scores more on the basis of live video operative in laboratory mice. J Am Assoc Lab analyses in live scoring and to make it independent of 20 Anim Sci 2012; 51: 42–49. time delay and observer. 10. National Research Council Committee for the Update of the Guide for the Care and Use of Laboratory Animals. Acknowledgements Guide for the Care and Use of Laboratory Animals. 8th ed. The authors thank the technical facility for the construction Washington, DC: The National Academies Press, 2011. of the set-up parts and observation cages, the volunteers who 11. Kilkenny C, Browne W, Cuthill IC, et al. Animal contributed to the evaluation and the technical assistants for research: Reporting in vivo experiments: the ARRIVE their support. guidelines. Br J Pharmacol 2010; 160: 1577–1579. 12. Liedtke C, Luedde T, Sauerbruch T, et al. Experimental liver Declaration of Conflicting Interests fibrosis research: update on animal models, legal issues and translational aspects. Fibrogenesis Tissue Repair 2013; 6: 19. The author(s) declared no potential conflicts of interest with 13. Team RC. R: A language and environment for statistical respect to the research, authorship and/or publication of this computing. Vienna: R Foundation for Statistical article. Computing, 2018. 14. Gamer M, Lemon J and Fellows Puspendra Singh I. irr: Funding Various Coefficients of Interrater Reliability and The author(s) received no financial support for the research, Agreement. R package version 0.84.1., https://CRAN.R- authorship and/or publication of this article. project.org/package=irr. CRAN, 2010. 15. Cicchetti D. Guidelines, criteria, and rules of thumb for ORCID iDs evaluating normed and standardized assessment instru- Steven R Talbot https://orcid.org/0000-0002-9062-4065 ment in psychology. Psychol Assess 1994; 6: 284–290. S Bruch https://orcid.org/0000-0002-6381-7072 16. McHugh ML. Interrater reliability: The kappa statistic. Rene H Tolba https://orcid.org/0000-0002-0383-3994 Biochem Medica 2012; 22: 276–282. 17. Miller AL, Kitson GL, Skalkoyannis B, et al. Using the Supplemental Material mouse grimace scale and behaviour to assess pain in CBA mice following vasectomy. Appl Anim Behav Sci 2016; Supplemental material for this article is available online. 181: 160–165. 18. Bryant CD, Zhang NN, Sokoloff G, et al. Behavioral differ- References ences among C57BL/6 substrains: Implications for trans- 1. Langford DJ, Bailey AL, Chanda ML, et al. Coding of genic and knockout studies. Neurogenetics 2008; 22: 315–331. facial expressions of pain in the laboratory mouse. Nat 19. Kim DG, Gonzales EL, Kim S, et al. Social interaction Methods 2010; 7: 447–449. test in home cage as a novel and ethological measure of 2. Dalla Costa E, Minero M, Lebelt D, et al. Development of social behavior in mice. Exp Neurobiol 2019; 28: 247–260. the Horse Grimace Scale (HGS) as a pain assessment tool 20. Kopaczka M, Ernst L, Schock J, et al. Introducing CNN- in horses undergoing routine castration. PloS One 2014; 9: Based Mouse Grim Scale Analysis for Fully Automated e92281. Image-Based Assessment of Distress in Laboratory Mice. 3. Viscardi AV, Hunniford M, Lawlis P, et al. Development 2018, p. 101–106. Granada, Spain: Eurographics of a Piglet Grimace Scale to evaluate piglet pain using Association. 90 Laboratory Animals 54(1)

Re´sume´ L’e´chelle de la grimace de la souris (MGS) a e´te´ largement utilise´e pour permettre une e´valuation non invasive de la de´tresse ou de la douleur chez les souris. Le but de cette e´tude e´tait d’ame´liorer sa perform- ance pour ge´ne´rer plus rapidement, de manie`re reproductible et en aveugle, des re´sultats fiables pour l’e´laboration automatise´e et standardise´e d’images de notation sur l’e´chelle MGS et d’e´valuation simultane´e impliquant jusqu’a` quatre animaux. Les vide´os de sept souris C57BL/6N ont e´te´ ge´ne´re´es dans le cadre d’une expe´rience visant a` e´valuer la douleur et le stress induits par des injections intrape´ritone´ales re´pe´te´es de te´trachlorure de carbone (CCl4). Les scores MGS ont e´te´ releve´s 1 h avant et apre`s l’injection. L’enregistrement vide´oae´te´ effectue´ pendant 10 min dans des boıˆtes d’observation particulie`res. Pour la se´lection manuelle, des images de chaque souris e´taient choisies au hasard pour l’analyse de qualite´. Un score leur e´tait attribue´ selon six crite`res de se´lection de qualite´ (0 ¼ non, 1 ¼ moyen, 2 ¼ pleine conformite´); le score maximum possible est de 12. Dans l’ensemble, 609 images issues de six vide´os ont e´te´ e´value´es pour la qualite´ d’e´valuation sur l’e´chelle MGS; l’e´valuation a e´te´ effectue´e en utilisant un outil de se´lection d’images ou par notation manuelle. Avec la mesure manuelle, 288 photos (48,3% de toutes les images ge´ne´re´es au hasard) ont e´te´ juge´es mesurables a` l’aide de l’e´chelle MGS (note moyenne ¼ 22,15 e´cart type de 6,3). Pour e´valuer l’algorithme, les e´valuations de diffe´rents groupes d’e´valuateurs (de´butant, moyen, forme´ de niveau professionnel) ont e´te´ compare´es a` l’image automatise´ege´ne´re´e.). Ces diffe´rences ne sont pas significatives (p ¼ 0,1091). Cette e´tude de´montre une ame´lioration de la configuration et un outil de se´lection d’image pouvant ge´ne´rer des images re´pe´tables, sans partialite´ de la part de l’observateur, et normalise´es permettant l’attribution d’un score sur l’e´chelle MGS.

Abstract Die Mouse Grimace Scale (MGS) wird ha¨ufig zur nicht-invasiven Untersuchung von Leiden/Schmerzen bei Ma¨usen eingesetzt. Ziel dieser Studie war es, die MGS-Leistungsfa¨higkeit weiter zu verbessern, um wieder- holbare, schnellere, verblindete und zuverla¨ssige Ergebnisse fu¨r die Entwicklung automatisierter und stan- dardisierter Bilder fu¨r MGS-Scoring und die parallele Auswertung von bis zu vier Tieren zu generieren. Videos von sieben C57BL/6N-Ma¨usen wurden in einem Experiment zur Beurteilung von durch wiederholte intraperitoneale Injektion von Kohlenstofftetrachlorid (CCl4) verursachte Schmerzen und Stress erzeugt. Die MGS-Werte wurden jeweils 1 Stunde vor und nach der Injektion ermittelt. Die Videoaufzeichnung wurde 10 Minuten lang in speziellen Beobachtungsboxen durchgefu¨hrt. Fu¨r die manuelle Bewertung wurden die Bilder der einzelnen Ma¨use willku¨rlich fu¨r die Qualita¨tsanalyse manuell ausgewa¨hlt und nach sechs Qualita¨tsauswahlkriterien bewertet (0 ¼ nein, 1 ¼ ma¨ßig, 2 ¼ volle U¨bereinstimmung); die ho¨chstmo¨gliche Punktzahl betrug 12. Insgesamt wurden 609 Bilder aus sechs Videos auf MGS-Scoringqualita¨t ausgewertet; die Auswertung erfolgte mit dem Bildauswahltool oder manuell. Beim manuellen Scoring wurden 288 Bilder (48,3% aller zufa¨llig generierten Bilder) mit MGS als bewertbar eingestuft (Mittelwert ¼ 22,15 Standardabweichung 6,3). Zur Auswertung des Algorithmus wurden Bewertungen verschiedener Bewertergruppen (Anfa¨nger, Mittelstufe, Professionell) mit dem automatisch generierten Bild verglichen.). Diese Unterschiede waren nicht signifikant (p ¼ 0,1091). Diese Studie demonstriert einen verbesserten Aufbau und ein Bildauswahltool zur Erzeugung wiederhol- barer, nicht beobachter-beeinflusster und standardisiertee Bilder fu¨r die MGS-Auswertung.

Resumen La Escala de Muecas de Ratones (MGS) ha sido utilizada extensamente para el examen no invasivo de dolor/ molestias en ratones. El objetivo de este estudio era mejorar su rendimiento para generar resultados repe- tibles, ma´sra´pidos, ciegos y fiables para el desarrollo de ima´genes estandarizadas y automa´ticas para la puntuacio´n de MGS y evaluacio´n simulta´nea de hasta cuatro animales. Vı´deos de siete ratones C57BL/6N fueron generados en un experimento para evaluar el dolor y el estre´s inducidos por repetidas inyecciones intraperitoneales de tetracloruro de carbono (CCl4).Se tomaron puntua- ciones MGS 1 hora antes y despue´s de la inyeccio´n. Se hizo una grabacio´nenvı´deo durante 10 minutos en Ernst et al. 91 cajas de observacio´n especiales. Para una seleccio´n manual, se escogieron fotos aleatorias de cada rato´n para realizar ana´lisis de calidad y se puntuaron segu´n seis criterios de seleccio´n de calidad (0 ¼ no, 1 ¼ moderado, 2 ¼ cumplimiento total); la ma´xima puntuacio´n posible era 12. En general, 609 fotos de seis vı´deos fueron evaluadas para ver la calidad segu´n la puntuacio´n MGS; se realizo´ una evaluacio´n usando la herramienta de seleccio´n de fotos o mediante puntuacio´n manual. Con la puntuacio´n manual, 288 fotos (48,3% de todas las fotos generadas aleatoriamente) se consideraron que eran puntuables usando MGS (puntuacio´n media ¼ 22,15 desviacio´n esta´ndar 6,3). Para evaluar el algoritmo, se compararon distintas calificaciones de otros grupos de calificadores (principiante, nivel medio de formacio´n, profesional) con la imagen generada automa´ticamente). Estas diferencias no fueron significativas (p ¼ 0,1091). Este estudio demuestra un marco mejorado y una herramienta de seleccio´n de fotos que puede generar fotos estandarizadas, repetibles e imparciales para la puntuacio´n MGS. Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 92–98 ! The Author(s) 2019 Semi-automated generation of Article reuse guidelines: sagepub.com/journals-permissions pictures for the Mouse Grimace Scale: DOI: 10.1177/0023677219881664 journals.sagepub.com/home/lan A multi-laboratory analysis (Part 2)

Lisa Ernst1, Marcin Kopaczka2, Mareike Schulz1, Steven R Talbot3 , Birgitta Struve3, Christine Ha¨ger3 , Andre´ Bleich3 , Mattea Durst4, Paulin Jirkof4 , Margarete Arras4, Roelof Maarten van Dijk5, Nina Miljanovic5,6, Heidrun Potschka5 , Dorit Merhof2 and Rene H Tolba1

Abstract The Mouse Grimace Scale (MGS) is an established method for estimating pain in mice during animal studies. Recently, an improved and standardized MGS set-up and an algorithm for automated and blinded output of images for MGS evaluation were introduced. The present study evaluated the application of this standardized set-up and the robustness of the associated algorithm at four facilities in different locations and as part of varied experimental projects. Experiments using the MGS performed at four facilities (F1–F4) were included in the study; 200 pictures per facility (100 pictures each rated as positive and negative by the algorithm) were evaluated by three raters for image quality and reliability of the algorithm. In three of the four facilities, sufficient image quality and consistency were demonstrated. Intraclass correlation coefficient, calculated to demonstrate the correlation among raters at the three facilities (F1–F3), showed excellent correlation. The specificity and sensitivity of the results obtained by different raters and the algorithm were analysed using Fisher’s exact test (p < 0.05). The analysis indicated a sensitivity of 77% and a specificity of 64%. The results of our study showed that the algorithm demonstrated robust performance at facilities in different locations in accordance with the strict application of our MGS setup.

Keywords MGS, Mouse Grimace Scale, pain assessment, severity assessment

Date received: 4 August 2019; accepted: 19 September 2019

1Institute for Laboratory Animal Science, RWTH Aachen University, Based on the developments in experimental animal Germany 2Institute of Imaging & Computer Vision, RWTH Aachen University, research and implementation of the European Union 1 Germany Directive (2010/63 EU) on the protection of animals 3Institute for Laboratory Animal Science and Central Animal used for scientific purposes, ensuring the highest pos- Facility, Hannover, Germany sible level of animal well-being has become a major 4Anaesthesia and Perioperative Pain Research, University of priority in animal studies. Article 15 of this directive Zurich, Switzerland 5Institute of Pharmacology, Toxicology and Pharmacy, Ludwig- mandates a severity assessment of each procedure in 1 Maximilians-University, Germany an animal study. Based on this requirement, methods 6Graduate School of Systemic Neurosciences, GSN LMU Munich, of evaluating any changes in animal well-being and Germany estimating potential suffering are necessary. Because most rodent species used in animal research Corresponding author: Rene H Tolba, Institute for Laboratory Animal Science & are flight and prey animals,2,3 these animals avoid 4 Experimental Surgery, University Hospital RWTH Aachen, overtly exhibiting signs or vocalizing their pain. Due Pauwelsstraße 30, 52074 Aachen, Germany. to this lack of self-indication of pain severity by the Email: [email protected] Ernst et al. 93 animals,4 objective criteria must be implemented to At F2, a project on pain severity assessment after assess pain severity. inducing liver fibrosis was conducted. Fibrosis was Researchers have employed various methods for this induced using CCl4 dissolved in germ oil. Male purpose, including clinical examinations and scoring C57Bl/6N mice were intraperitoneally injected (50 ml) 5 sheets, specific stress parameter evaluations and either with 0.6 ml/kg CCl4 in mixed germ oil or germ behavioural tests.6–8 oil only (control) three times per week for 4 weeks. The Mouse Grimace Scale (MGS), a noninvasive MGS scoring was performed 1 h before and after injec- method of visually recognizing pain on the basis of tions. Baseline measurements were taken before start- facial expressions of mice,9 has become an established ing the induction of liver fibrosis. method for identifying acute pain in mice and has been At F3, a project on the effects of intraperitoneal trans- repeatedly used in animal experiments.3,10,11 In the ori- mitter implantation or a corresponding SHAM oper- ginal publication of Langford et al.,9 the MGS pictures ation on different clinical and behavioural parameters for analysis were cropped from pre-recorded videos, in female C57Bl/6J mice was conducted. To differentiate selected and then scored manually. between the effects of the surgical procedure itself and In a recent study, our group could improve the the transmitter, SHAM-operated mice were monitored MGS set-up by video recording up to four animals sim- as a control group. MGS scoring was performed at 30 ultaneously. Additionally, a tool for automated image and 180 mins after surgery on the same day and on days selection for blinded MGS analysis was introduced 1, 2, 3, 5 and 7. (Ernst et al.,).12 Automation and standardization of At F4, male and female mice, wild-types for A1783V technical processes is necessary to minimize subjective mutation and with or without the presence of Cre were influences and avoid selection and performance biases. used for animal experiments.13 All mice underwent sur- The aim of the present study was to investigate the gery during which a telemetry device (HD-X02, DSI, St application of the modified and improved MGS set- Paul, USA) was subcutaneously implanted and an elec- up and the robustness of the automated process in a trode was implanted into the hippocampus. The elec- multi-laboratory analysis. The application and con- trode leads were fixed with three screws in the skull and formity of the set-up and image selection tool were covered with paladur (HeraeusÕ, Hanau, Germany). also assessed. Baseline measurements were taken 1 day prior to sur- gery after a habituation phase of 10 mins. Video record- ings were taken 1 h prior to surgery and at 1, 3, 6, 25 Materials and methods and 49 h following gain of consciousness after surgery. Study design All the studies were conducted in accordance with the legal requirements, and anaesthesia and analgesia To confirm the applicability of the improved MGS interventions were obtained if appropriate. Human set-up and automated image selection tool (Ernst endpoint protocols were applied for all the studies. et al., under review), animal research studies at four Additional details concerning procedures or surgeries experimental facilities (F1–F4) at different locations can be found in the supplementary material. employed this approach. Most of the studies were part of assessments that evaluated the severity of pro- Ethical statement cedures using animals for scientific purposes. The recently introduced modified MGS set-up was analo- MGS evaluations were a preliminary part of the experi- gously implemented in all the studies. ments. According to the 3R principles,14 no additional At F1, the MGS set-up was implemented in a refine- animals were used to perform this study. All the studies ment study on the possible benefits of infiltration with were conducted in accordance with EU Directive (2010/ local anaesthetics (lidocaine–bupivacaine) in combin- 63 EU) and the legal provisions of German Animal ation with systemic administration via Welfare Act (TierSchG).15 drinking water compared with systemic analgesia only Cantonal Veterinary Office, Zurich, Switzerland, after surgery. Male and female C57Bl/6J mice were sub- approved the animal housing and experimental proced- jected to a minor laparotomy and treated with a com- ures for the project at F1 under licence number bination of local and systemic analgesia, local or 097/2017. For the project at F2, the permission licence systemic analgesia alone, or anaesthesia and systemic was granted by Governmental Animal Care and analgesia only. Among other behavioural tests, the Use Committee (Landesamt fu¨r Natur, Umwelt MGS was used to assess changes in animal well-being und Verbraucherschutz, LANUV AZ: 84-02.04.2014. and potential pain severity. Baseline measurements were A417, North Rhine Westphalia, Germany). All taken at the same time points during the day as post- experiments performed at F3 were approved by operative measurements: 1, 6 and 24 h after surgery. Niedersa¨chsisches Landesamt fu¨r Landwirtschaft und 94 Laboratory Animals 54(1)

Lebensmittelsicherheit (LAVES) under licence number as negative could not be used. Evaluation of image 15/1905. For the project at F4, all investigations were quality was performed by three raters at three facilities. approved by the Government of Upper Bavaria (license All the raters had comparable experience in the per- number 55.2-1-54-2532-168-2016). formance and assessment of MGS images. The criteria for fulfilling individual evaluation points were discussed Animals with all the participants in advance. For manual selection, a maximum score of 54 points Two facilities (F1 and F3) used C57Bl/6J mice, one facil- could be achieved by fulfilling all positive criteria. Six ity (F1) used C57Bl6/N mice, and one facility (F4) used evaluation criteria (mouse in profile, eyes recognizable, transgenic mice with a C57Bl6 background selected ears recognizable, nose recognizable, mouse in steady according to the researchers’ interest in their main position and general image quality) were assessed and study. The choice of sex, age and strain was independ- assigned a maximum of nine points per criterion. The ently made and was a nonexclusive part of the present quality gradations for fulfilment within the evaluation study. Only a black coat colour and the presence of adult criteria were as follows: 1–3 ¼ poor, 4–6 ¼ moderate, animals were indicated as relevant to the present study. and 7–9 ¼ excellent. Images rejected due to non-fulfil- Additional information concerning housing and hus- ment of even any one criterion were given a score of 1. bandry conditions according to the ARRIVE guide- As a cut-off value, images with a score of 30, that is lines16 can be found in the supplementary material. 55% of the maximum score, or a score of 1 for any evaluation criterion were rated as negative. In this MGS set-up study, the focus of the algorithm was mainly on the detection of the eye. With the modified and improved MGS video recording set-up, four animals could be simultaneously filmed Statistical analysis under standardized conditions. To maximize the qual- ity of MGS pictures, four equally sized MGS boxes GraphPad Prism (GraphPad Prism, Version 7, La Jolla (9 cm 5cm 5 cm), which were placed in an observa- California USA, www.graphpad.com) and R software tion rack located within a light tent, were illuminated (version 3.4.1)18 were used for data analysis. Intraclass from the side, bottom and front. Additional air holes correlation coefficient (ICC), which is an estimate of were drilled into the front as well as into the lid of the inter-rater reliability, was calculated using the ICC boxes to reduce fogging. At F1, the recording time was function from the interrater reliability (irr) library18 set at approximately 5 mins, and at F2, F3 and F4, the using a two-way ANOVA to assess agreement. recording time was set at 10 mins. Fisher’s exact test was used to analyse specificity and For automated analysis, box positions in the videos sensitivity. To identify reasons for false positive assign- were manually defined and 300 images from each box ments by the algorithm, data were analysed using one- were automatically extracted. Subsequently, the algo- way ANOVA and Tukey’s multiple comparison test. rithm analysed the extracted box images using a fully The data were considered statistically significant at convolutional architecture17 to detect the position and p < 0.05. size of the animals’ eyes. Eye areas in the images were automatically measured, and all the images in which Results the largest visible eye had an area of at least 100 pixels were considered suitable for MGS scoring The number of scorable images at the facilities and their (image size: approximately 500 500 pixels). Among distribution are presented in Figure 1. For each facility, these images, 10 images per animal were randomly 100 positive and negative images each were selected by selected by the algorithm for further manual scoring. the algorithm and evaluated by raters for image quality. For this purpose, we developed a tool that displays the Examples of such images from each facility are shown in images of all trials and animals in a randomized and Figure 1. Regarding facility and location: at F1, an aver- blinded manner to minimize bias. age of 97 images (standard deviation (SD) ¼ 17.08, n ¼ 3, n corresponds to the number of raters (one rater per Evaluation process facility)) were suitable for MGS evaluation; at F2, an average of 86.25 images (SD ¼ 9.52, n ¼ 3) were suitable; At each facility, 100 images each rated as positive and at F3, an average of 96 images (SD ¼ 27.67, n ¼ 3) or negative by the algorithm were selected for image were suitable. Because of deviations in performance quality evaluation; therefore, a total of 200 images from the initial set-up protocol in terms of colour and per facility were evaluated. Images rated as positive illumination, videos from F4 could not be included for were suitable for MGS scoring, whereas those rated image-quality evaluation. Ernst et al. 95

Figure 1. Examples of images of all facilities: F1, top left; F2, top right; F3, bottom left; and F4, bottom right. The number of scorable images between the facilities is presented. The data distribution within the different facilities shows a bimodal distribution. The cut-off value is indicated by a dashed line. The data distribution for F4 cannot be displayed because of differences in colour, brightness and the presence of a head implant, given that the algorithm could not generate adequate images without further adjustment.

Table 1. Results of the analysis of intraclass correlation coefficient (ICC) values and their 95% confidence intervals as determined using a two-way ANOVA. r0 is a specification of the null hypothesis (H1: r ¼ r0). H1: r > r0 denotes that a one- sided F-test was performed.

Models: 2-Way Fl v. F2 Fl v. F3 F2 v. F3 Fl v. F3

Typ:Agreement Subjects 600 600 600 600 Raters 2 2 2 2 ICC (2) 0.95 [0.94, 0.96] 0.91 [0.89,0.92] 0.95 [0.94,0.96] 0.966 [0.95,0.96] F-test, HO: r0 ¼ 0; F (599, 600) ¼ 19.2 F (599, 600) ¼ 10.7 F (599, 600) ¼ 19.1 F (599, 1200) ¼ 22.5 H1: r0 > 0 p value <0.05 <0.05 <0.05 <0.05 95% confidence interval 0.939 < ICC < 0.956 0.89 < ICC < 0.92 0.939 < ICC < 0.955 0.949 < ICC < 0.961 for ICC Population Values

ICC, which was calculated to demonstrate the cor- relation between raters at the three facilities (F1–F3) Table 2. Specificity and sensitivity of the algorithm. showed an excellent correlation (Table 1);19 however, no significant differences were detected in their correl- 95% confidence ation coefficients. The results obtained by different Effect size Value interval raters and the algorithm, as analysed using Fisher’s Sensitivity 0.7774 0.7245–0.8226 exact test (p < 0.05), indicated a sensitivity of 77% Specificity 0.6425 0.5944–0.6879 and a specificity of 64% (Table 2). Positive predictive value 0.5983 0.5466–0.6479 Among the 600 rated images, an average of 88.67 Negative predictive value 0.8082 0.7613–0.8477 (SD ¼ 30.16) images assessed by the algorithm were 96 Laboratory Animals 54(1)

of different breeds that have a C57Bl/6 background and represent both sexes. In the selection of the pictures, we could not detect any signs for gender-specific selection criteria for MGS. The application of the algorithm shows, in principle, a positive result rate, with a sensi- tivity of 77% and specificity of 64%. The reduction in false positives supports the use of this algorithm. With regard to specificity, the ‘nose recognizable’ and ‘mouse in steady position’ criteria are decisive for the selection of false positive results (Figure 2). Recognition of the ‘nose recognizable’ criterion appears to result in a reduced reliability. This has also been reported by other studies and may not be algorithm-dependent alone.23

Figure 2. The number of rejected images per criterion Conclusion with median and 95% confidence interval calculated using a one-way ANOVA: F (6,14) ¼ 18,67, *Tukey’s multiple com- The present study demonstrated the applicability of parisons test adjusted p value < 0.05. the improved MGS set-up and the functionality of the associated algorithm at three facilities in different loca- evaluated as false positives compared with those tions. Limitations in the specificity of the algorithm, assessed by the raters. A total of 44 similar images especially because of the lack of detection of moving were rated as false positives by the raters compared animals, are currently being adjusted by improving the with the algorithm. For images evaluated as false posi- algorithm, and this should result in a reduction in the tives by the algorithm depending on the different evalu- number of images rated as false positives in future stu- ation criteria, the number of rejected images is dies (Kopaczka et al.,24 submitted to the 2019 Annual presented in Figure 2. The results showed significant International Conference of EMBC) differences according to the ‘nose recognizable’ and ‘mouse in steady position’ criteria. Declaration of Conflicting Interests The author(s) declared no potential conflicts of interest with Discussion respect to the research, authorship, and/or publication of this article. The present study evaluated the applicability and robustness of the modified MGS set-up and the algo- rithm for image selection. Figure 1 shows that the stan- Funding dardized set-up is applicable and reproducible and that The author(s) disclosed receipt of the following financial sup- a sufficient number of images were deemed suitable for port for the research, authorship, and/or publication of this analysis at three facilities in different locations. article: The study was funded in part by German Research Reproducibility is an important attribute in the per- Foundation (Deutsche Forschungsgemeinschaft - DFG) FOR formance of animal experiments.20 Vasilevsky et al. 2591 Consortium and ME3737/18-1. Reference numbers are listed in the supplementary material. have demonstrated that the applicability of scientific methods is dependent on the ability of reproducing other studies and building on previous work, and they ORCID iDs noted that a lack in the provision of methodological Steven R Talbot https://orcid.org/0000-0002-9062-4065 details considerably reduces this reproducibility.21 Christine Ha¨ger https://orcid.org/0000-0002-6971-9780 This conclusion is supported by the findings of our Andre´Bleich https://orcid.org/0000-0002-3438-0254 study such that deviations from the standardized proto- Paulin Jirkof https://orcid.org/0000-0002-7225-2325 col (e.g. colour and illumination level) at one facility Heidrun Potschka https://orcid.org/0000-0003-1506-0252 rendered it impossible for the algorithm to evaluate Rene H Tolba https://orcid.org/0000-0002-0383-3994 images without adapting the algorithm to this particu- lar set-up. References Miller and Leach describe in their study deviations 1. EU EP. Directive 2010/63/EU of the European Parliament in MGS values between different mouse strains and and of the Council of 22 September 2010 on the protection 22 sexes. In order to investigate the selection criteria of animals used for scientific purposes. Document for MGS images, we have deliberately included animals 32010L0063. European Parliament, 22 September 2010. Ernst et al. 97

2. Faller KM, McAndrew DJ, Schneider JE, et al. 13. Tang SH, Silva FJ, Tsark WM, et al. A Cre/loxP-deleter Refinement of analgesia following thoracotomy and transgenic line in mouse strain 129S1/SvImJ. Genesis experimental myocardial infarction using the Mouse 2002; 32: 199–202. Grimace Scale. Exp Physiol 2015; 100: 164–172. 14. Russell WMS and Burch RL. The Principles of Humane 3. Matsumiya LC, Sorge RE, Sotocinal SG, et al. Using the Experimental Technique. London: Methuen, 1959. Mouse Grimace Scale to re-evaluate the efficacy of post- 15. Tierschutzgesetz in der Fassung der Bekanntmachung operative analgesics in laboratory mice. JAALAS 2012; vom 18. Mai 2006 (BGBl. I S. 1206, 1313), das zuletzt 51: 42–49. durch Artikel 1 des Gesetzes vom 17. Dezember 2018 4. Kanzler S, Rix A, Czigany Z, et al. Recommendation for (BGBl. I S. 2586) gea¨ndert worden ist. severity assessment following liver resection and liver 16. Kilkenny C, Browne W, Cuthill IC, et al. Animal transplantation in rats: Part I. Lab Anim 2016; 50: research: Reporting in vivo experiments: the ARRIVE 459–467. guidelines. Br J Pharmacol 2010; 160: 1577–1579. 5. Morton DB and Griffiths PH. Guidelines on the recog- 17. Navab N, Hornegger JM, Wells W, et al. Medical Image nition of pain, distress and discomfort in experimental Computing and Computer-Assisted Intervention. In: animals and a hypothesis for assessment. Vet Rec 1985; Proceedings of the 18th International Conference 116: 431–436. MICCAI, Munich, Germany, October 5–9, 2015, 6. Seibenhener ML and Wooten MC. Use of the Open Field Part III. 2015, pp. 234–241. Germany: Springer Maze to measure locomotor and anxiety-like behavior in International publishing AG Switzerland. miceJoVE 2015; e52434. 18. Team RC. R: A Language and Environment for Statistical 7. Peng M, Zhang C, Dong Y, et al. Battery of behavioral Computing. Vienna, Austria: R Foundation for Statistical tests in mice to study postoperative delirium. Sci Rep Computing, 2018. 2016; 6: 29874. 19. Cicchetti D. Guidelines, criteria, and rules of thumb for 8. Pratt D, Fuchs PN and Sluka KA. Assessment of avoid- evaluating normed and standardized assessment instru- ance behaviors in mouse models of muscle pain. ment in psychology. Psychol Assess 1994; 6: 284–290. 20. Tilson HA and Schroeder JC. Reporting of results from Neuroscience 2013; 248: 54–60. animal studies. Environ Health Perspect 2013; 121: 9. Langford DJ, Bailey AL, Chanda ML, et al. Coding of A320–321. facial expressions of pain in the laboratory mouse. 21. Vasilevsky NA, Brush MH, Paddock H, et al. On the repro- Nat Methods 2010; 7: 447–449. ducibility of science: Unique identification of research 10. Miller AL, Kitson GL, Skalkoyannis B, et al. Using the resources in the biomedical literature. Peer J 2013; 1: e148. mouse grimace scale and behaviour to assess pain in CBA 22. Miller AL and Leach MC. The Mouse Grimace Scale: mice following vasectomy. Appl Anim Behav Sci 2016; A clinically useful tool? PloS One 2015; 10: e0136000. 181: 160–165. 23. Dalla Costa E, Pascuzzo R, Leach MC, et al. Can grim- 11. Miller AL and Leach MC. Using the mouse grimace scale ace scales estimate the pain status in horses and mice? to assess pain associated with routine ear notching and A statistical approach to identify a classifier. PloS One the effect of analgesia in laboratory mice. Lab Anim 2015; 2018; 13: e0200339. 49: 117–120. 24. Kopaczka M, Tillmann D, Ernst L, et al. Assessment of 12. Ernst L, Kopaczka M, Schulz M, et al. Improvement of Laboratory Mouse Activity in Video Recordings Using the Mouse Grimace Scale set-up for implementing a semi- Deep Learning Methods. In: International Engineering in automated Mouse Grimace Scale scoring (Part 1). Lab Medicine and Biology Conference (EMBC), Berlin, Anim 2020; 54: 83–91. Germany, 2019, Paper FrA15.2.

Re´sume´ L’e´chelle de la grimace de la souris (MSG) est une me´thode reconnue pour e´valuer la douleur chez des souris pendant les e´tudes animales. Re´cemment, une configuration ame´liore´e et normalise´e et un algorithme d’automatisation ainsi que la production d’images en aveugle ont e´te´ introduits pour l’e´valuation sur l’e´chelle MSG. La pre´sente e´tude a e´value´ l’applicabilite´ de cette configuration standardise´e et la robustesse de l’algorithme associe´ dans quatre installations de diffe´rents endroits et dans le cadre de divers projets expe´rimentaux. Des expe´riences utilisant l’e´chelle MGS effectue´es dans quatre installations (F1–F4) ont e´te´ comprises dans l’e´tude ; 200 photos par e´tablissement (100 photos chacun, de´finies comme positives et ne´gatives par l’algorithme) ont e´te´ e´value´es par trois e´valuateurs en ce qui concerne la qualite´ de l’image et la fiabilite´ de l’algorithme. Dans trois des quatre installations, la qualite´ des images et la cohe´rence ont e´te´ de´montre´es. Le coefficient de corre´lation intraclasse, calcule´ pour de´montrer la corre´lation entre les e´va- luateurs dans les trois installations (F1–F3), a montre´ une excellente corre´lation. La spe´cificite´ et la sensi- bilite´ des re´sultats obtenus par diffe´rents e´valuateurs et l’algorithme ont e´te´ analyse´es en utilisant le test exact de Fisher (p < 0,05). L’analyse a re´ve´le´ une sensibilite´ de 77% et une spe´cificite´ de 64%. Les re´sultats de 98 Laboratory Animals 54(1) notre e´tude ont montre´ que l’algorithme a de´montre´ de solides performances dans des installations d’end- roits diffe´rents conforme´ment a` l’application stricte de notre configuration MGS.

Abstract Die Mouse Grimace Scale (MGS) ist eine etablierte Tierversuchs-Methode zur Schmerzbeurteilung bei Ma¨usen. Ku¨rzlich wurden ein verbesserter und standardisierter MGS-Aufbau und ein Algorithmus zur auto- matisierten und verblindeten Erzeugung von Bildern fu¨r die MGS-Auswertung eingefu¨hrt. Die vorliegende Studie bewertete die Anwendbarkeit dieses standardisierten Aufbaus und die Robustheit des zugeho¨rigen Algorithmus in vier Einrichtungen an verschiedenen Standorten und im Rahmen verschiedener Versuchsprojekte. Experimente mit dem in vier Einrichtungen (F1–F4) durchgefu¨hrten MGS wurden in die Studie einbezogen; 200 Bilder pro Einrichtung (je 100 Bilder, die vom Algorithmus als positiv und negativ bewertet wurden) wurden von drei Bewertern hinsichtlich Bildqualita¨t und Zuverla¨ssigkeit des Algorithmus bewertet. In drei der vier Einrichtungen wurde eine ausreichende Bildqualita¨t und Konsistenz nachgewiesen. Der Intraklassen-Korrelationskoeffizient, der berechnet wurde, um die Korrelation zwischen den Bewertern in den drei Einrichtungen (F1–F3) zu zeigen, ergab eine ausgezeichnete Korrelation. Die Spezifita¨t und Sensitivita¨t der Ergebnisse, die von verschiedenen Bewertern und dem Algorithmus erhalten wurden, wurden mit dem exakten Test nach Fisher (p < 0,05) analysiert. Die Analyse ergab eine Sensitivita¨t von 77% und eine Spezifita¨t von 64%. Die Ergebnisse unserer Studie belegen, dass der Algorithmus eine robuste Leistung in Einrichtungen an verschiedenen Standorten bei konsequenter Anwendung unseres MGS-Aufbaus demonstrierte.

Resumen La Escala de Mueca de Ratones (MGS) es un me´todo establecido para estimar el dolor de ratones durante estudios con animales. Recientemente, se introdujo un marco MGS estandarizado y mejorado y un algoritmo para la produccio´n de ima´genes automa´tica y ciega para una evaluacio´n MGS. El presente estudio evaluo´ la aplicabilidad de este marco estandarizado y la solidez del algoritmo relacionado en cuatro instalaciones de distintas ubicaciones y como parte de varios proyectos experimentales. Los experimentos que usaron la MGS y que se realizaron en nuestras instalaciones (F1–F4) fueron incluidos en el estudio; 200 fotos por instalacio´n (100 fotos, cada una calificada como positiva o negativa por el algoritmo) fueron evaluadas por tres califica- dores segu´n la calidad de la imagen y la fiabilidad del algoritmo. En tres de las cuatro instalaciones, se demostro´ una suficiente calidad de la imagen y consistencia. El coeficiente de correlacio´n intraclase, calcu- lado para demostrar la correlacio´n entre calificadores en las tres instalaciones (F1–F3), mostro´ una excelente correlacio´n. La especificidad y sensibilidad de los resultados obtenidos por distintos calificadores y el algor- itmo fueron analizados usando la prueba exacta Fisher (p < 0,05).El ana´lisis indico´ una sensibilidad del 77% y una especificidad del 64%.Los resultados de nuestro estudio demostraron que el algoritmo obtuvo un rendi- miento so´lido en las instalaciones de distintas ubicaciones segu´n la estricta aplicacio´n de nuestro marco MGS. Special Issue: Severity Assessment Laboratory Animals 2020, Vol. 54(1) 99–110 ! The Author(s) 2019 Defining body-weight reduction as a Article reuse guidelines: sagepub.com/journals-permissions humane endpoint: a critical appraisal DOI: 10.1177/0023677219883319 journals.sagepub.com/home/lan

Steven R Talbot1 , Svenja Biernot1, Andre Bleich1 , Roelof Maarten van Dijk2,LisaErnst3, Christine Ha¨ger1 , Simeon Oscar Arnulfo Helgers4, Babette Koegel3,InesKoska2, Angela Kuhla5, Nina Miljanovic2, Franz-Tassilo Mu¨ller-Graff5, Kerstin Schwabe4,ReneTolba3 , Brigitte Vollmar5, Nora Weegh1, Tjark Wo¨lk5, Fabio Wolf2,AndreasWree6, Leonie Zieglowski3, Heidrun Potschka2* and Dietmar Zechner5*

Abstract In many animal experiments scientists and local authorities define a body-weight reduction of 20% or more as severe suffering and thereby as a potential parameter for humane endpoint decisions. In this study, we evaluated distinct animal experiments in multiple research facilities, and assessed whether 20% body-weight reduction is a valid humane endpoint criterion in rodents. In most experiments (restraint stress, distinct models for epilepsy, pancreatic resection, liver resection, caloric restrictive feeding and a mouse model for Dravet syndrome) the animals lost less than 20% of their original body weight. In a glioma model, a fast deterioration in body weight of less than 20% was observed as a reliable predictor for clinical deterioration. In contrast, after induction of chronic diabetes or acute colitis some animals lost more than 20% of their body weight without exhibiting major signs of distress. In these two animal models an exclusive application of the 20% weight loss criterion for euthanasia might therefore result in an unnecessary loss of animals. However, we also confirmed that this criterion can be a valid parameter for defining the humane endpoint in other animal models, especially when it is combined with additional criteria for evaluating distress. In conclusion, our findings strongly suggest that experiment and model specific considerations are necessary for the rational integration of the parameter ‘weight loss’ in severity assessment schemes and humane endpoint criteria. A flexible implementation tailored to the experiment or intervention by scientists and authorities is therefore highly recommended.

Keywords rodent, severity, euthanasia, stress, distress

Date received: 18 February 2019; accepted: 13 September 2019

6Institute of Anatomy, University Medical Center, Rostock, Germany 1Institute for Laboratory Animal Science, Hannover Medical School, Germany *Heidrun Potschka and Dietmar Zechner are equal contributors. 2Institute of Pharmacology, Toxicology, and Pharmacy, Ludwig- Corresponding authors: Maximilians-University, Germany PD Dr. rer. nat. Dietmar Zechner, Institute for Experimental 3Institute for Laboratory Animal Science & Experimental Surgery Surgery, Rostock University Medical Center, Schillingallee 69a, and Central Laboratory for Laboratory Animal Science, RWTH 18057 Rostock, Germany. Aachen University, Germany Email: [email protected] 4Department of Neurosurgery, Hannover Medical School, Germany Prof. Dr. Heidrun Potschka, Institute of Pharmacology, Toxicology, 5Rudolf-Zenker-Institute of Experimental Surgery, University and Pharmacy, Ludwig-Maximilians-University, Munich, Germany. Medical Center, Rostock, Germany Email: [email protected] 100 Laboratory Animals 54(1)

The importance of animal welfare is increasingly appre- Materials and methods ciated by the general public as well as scientists. At the Data and computation same time animal experiments are necessary for basic research and often provide the basis for subsequent clin- Body weight data of rodents throughout distinct animal ical studies. For example, one publication describes that experiments were collected from different consortium from 76 animal studies, which were published in highly members and pooled in an online repository (see cited journals, the results of 28 animal studies (37%) online Supplementary file 1). Some data have been were replicated in human randomized trials, finally yield- also used for other publications (see online ing eight interventions, which were subsequently Supplementary file 2). The absolute body weight or approved for patients.1 This suggests that well per- the percentage of body weight change was assessed cen- formed animal studies can provide a solid basis for the trally. All experiments were executed in accordance development of novel therapies. To facilitate animal with the German legislation, the EU directive 2010/ experiments while improving animal welfare, it is neces- 63/EU and approved by local authorities (for reference sary to develop objective measures to evaluate distress of number see online Supplementary file 1). animals and to define humane endpoints. Already in 1985 Morton and Griffiths published that the body Animal models weight of animals can be an indicator of animal distress and that this criterion has the distinct advantage that it Restraint stress model in mice. Female C57BL/6 J can be objectively measured.2 In this publication more mice were obtained from the Central Animal Facility than 20% reduction of body weight plus no consumption (Hannover Medical School, Hannover, Germany). of water or food was defined as starving condition. In For restraint stress mice were placed into restraint subsequent publications it was suggested that animals, tubes on 10 consecutive days (days 1–10) for 60 min which lose more than 20% of body weight should be (from 09.00 to 10.00 am). Restrainers (tubes with killed in order to avoid suffering.3 Consequently, many 23 mm internal diameter and 93 mm length) consisted institutional animal care and use committees in the USA, of clear acrylic glass with ventilation holes (8 mm diam- but also in other countries, adopted the recommendation eter) and a whole-length spanning, 7 mm-wide opening to consider euthanasia for animals, which lost 20% of along the upper side of the tube. Following the restrain- their body weight unless a severe outcome for the ani- ing procedure mice were removed to their home mals was predicted and approved. Legislation has also cages.11 embraced the idea of defining humane endpoints for experimental animals. One example is the directive Streptozocin model in mice. For this study female 2010/63/EU of the European Parliament, which MRL/MpJ mice from the Central Animal Facility demands the use of humane endpoints without defining (Rostock Medical School, Rostock, Germany), which at which percentage of body weight loss an animal must are prone to spontaneously develop autoimmune pan- be killed.4 Supplementary information relating to this creatitis, were used. Diabetes was induced by ip (intra- directive, recommends euthanasia of animals, when peritoneal) injection of 50 mg/kg STZ (Sigma-Aldrich, more than 20% or more than 35% of body weight is Steinheim, Germany) on five consecutive days lost.5 However, a weight loss of 35% is considered to (days 1–5). All control mice were sham-treated with be an extreme endpoint that requires sound scientific jus- the appropriated vehicle (50 mmol/l sodium citrate tification. In some toxicological studies, feed restriction pH 4.5). studies, or animal models of colitis a weight loss of respectively, 20-30, 50 or about 40% can be observed, Dextran sodium sulfate-induced colitis in when compared to the starting weight or the body weight mice. Female C57BL/6 J mice were obtained from of a control group.6–8 Moreover, some mutant mouse the Central Animal Facility (Hannover Medical strains can have up to 50% reduction in body weight School, Hannover, Germany). For induction of an when compared to wild-type mice.9 However, absolute acute colitis dextran sulfate sodium (DSS, mol wt boundary values for euthanasia are also criticized.10 36,000–50,000; MP Biomedicals, Eschwege, Germany) Thus, the purpose of this study was to evaluate if 20% was used. Mice were exposed to 0% (control group, body-weight reduction is a valid parameter for defining H2O only), 1% and 1.5% DSS in drinking water for the humane endpoint in different animal studies. In order five consecutive days (days 1–5) and had access to a to assess the validity of the parameter in various ‘real- running wheel.11 life’ experimental conditions, we analyzed data from animal models with varying group designs and time Chemical status epilepticus model in rats. In female schedules, which have been completed in different Sprague Dawley rats with an electrode in the right hip- research facilities. pocampal dentate gyrus, status epilepticus (SE) was Talbot et al. 101 induced by fractionated lithium-pilocarpine injections (B6(Cg)-Scn1atm1.1Dsf/J; #026133) were crossed with (as described by Glien et al., 200112). SE was pharma- heterozygous female mice that express cre recombinase cologically terminated after 90 min. Post-SE rats were (X-linked to neuronal promotor Hprt gene; 129S1/ injected sc (sub-cutaneous) for two days with Ringer Sv-Hprttm1(CAG-cre)Mnn/J; #004302). The offspring lactate solution and fed with baby food until they resulted in heterozygous Dravet mice expressing the resumed normal feeding behaviour. A1783V mutation with/without Cre or wild-type mice with/without Cre presence. Animals received food Intracranial rat glioma model. BT4Ca cells were (ssniffÕ R/M-H, Sniff, Soest, Germany) and water AL stereotactically implanted into the right frontal cortex while Dietgel76A was offered as a supplement (Sniff, of male BDIX rats. Post-operatively, weight and gen- Soest, Germany) from P14 until P26. eral health condition were scored on a daily basis. Whenever a rat reached score 4 (severe neuronal symp- Statistics toms, apathy) or lost >20% body weight as compared to the pre-surgical weight, the animal was sacrificed. In Statistical computing was performed with the R soft- that condition the animal usually dies within the next ware (version 3.4.1) on a 64-bit machine.15 The follow- few hours. Therefore, this criterion was defined as ing packages were used: readxl for data extraction humane endpoint (please see Wu et al., 2018 for more from MS-Excel files, reshape2 for data restructuring detail).13 to the long format.16,17 Further, some plots were gen- erated using Prism Software (version 6.1, GraphPad Pancreatic resection model in rats. Wistar rats Prism Inc.). Group and/or time differences were ana- underwent a left sided pancreatic resection. The lysed either by t-test (with Welch correction in case of resected area was sealed with either a recently devel- unequal variances) or Mann-Whitney U test (Wilcoxon oped glue, fibrin or rinsed with NaCl as control Rank Sum Test), depending on data distribution. The group. The animals were observed for 14 post-operative Shapiro-Wilk test was used to test against the null days (POD) and body weight as well as severity score hypothesis of normality. In case of a rejection, the has been measured. non-parametric test was used. Results were considered to be significant at the a ¼ 0.05 threshold. Liver resection in rats. 50% of the liver was resected in Wistar rats. Resected area was sealed with Vivo 100, Fibrin glue or rinsed with NaCl. The animals were Results observed postoperative for three days and body Adult animals: impact of distress exposure, weight as well as severity score has been determined. disease models, surgical procedures and In another experiment a 50% liver resection has been drug treatment performed using Wistar rats (Janvier, France) and the resected area was sealed with a polyurethane-based Restraint stress model in mice. Mice exposed to 10 glue, fibrin glue or rinsed with NaCl. days of restraint stress lost weight during the early exposure phase resulting in a significantly reduced Kindling model with Celastrol administration in body weight (Figure 1(a)). However, towards the end mice. Male wild-type mice and HSP70-knockout of the stress model procedure animals started to regain mice were subjected to repeated kindling stimulations weight thereby reducing the difference to control via an electrode in the right amygdala (as described by animals. von Ru¨ den et al., 2015).14 Celastrol (1 mg/kg ip; Sigma Aldrich Chemie GmbH, Taufkirchen, Germany) or Streptozocin model in mice. Streptozocin administra- vehicle (5% ethanol, 0.1% Cremophor ELÕ in saline) tion in mice induced damage of pancreatic b cells result- was injected once daily 6 h before the electrical kindling ing in a type 1 diabetic phenotype with hyperglycemia. stimulation. The metabolic alterations in this model caused a pro- gressive loss of body weight resulting in a significantly Restrictive feeding in mice. For this study female reduced body weight in comparison with control mice C57BL/6J and ApoE deficient (ApoE-/-) mice were (Figure (a)). In eight animals (8 out of 13) weight loss fed either ad libitum (AL) or caloric restricted (CR, exceeded 20% during the first two months after strep- 60% of AL) for 74 weeks (C57BL/6 J) or 60 weeks tozocin exposure. These animals had a clinical score of (ApoE-/-). not higher than 5 on a scale from 0 to 37 (for details of the clinical score: see online Supplementary file 3). Dravet syndrome in mice. Conditional knockin Interestingly, mice that were not euthanized due to male mice carrying mutation A1783V in exon 26 weight loss started to regain weight from the third 102 Laboratory Animals 54(1)

Figure 1. Stress paradigms and disease models. (a) Restraint Stress model in mice. Animals exposed to restraint stress show a significant loss in body weight compared to control animals on days 1–6, 8, 9, 13 and 14 (t-test with p 0.05 and Welch correction in case of unequal variances, n ¼ 8, error bars are standard deviation). The maximum drop of body weight in single animals compared to starting conditions occurred on days 2 and 4 with 19%. No animal violated the 20% loss in body weight threshold. (b) Diabetes in MRL/MpJ mice. After induction of diabetes by streptozocin (STZ), mice lost significantly more body weight than control animals (Sham). Weight loss in animals with streptozocin-induced pancreatitis compared to untreated control animals was significant at day 22 and stayed so until day 113 (t-test or Mann-Whitney U test depending on data distribution, p 0.05, nctrl ¼ 19, nmodel ¼ 13, error bars are standard deviation). In one mouse a max- imum loss of body weight of 32% was observed on day 89. Nine animals dropped below the 20% body weight threshold. (c) Dextran sulfate sodium (DSS)-induced colitis in mice. 0% DSS v. 1% DSS. Mice reached a body weight that did not differ significantly from control animals at day nine (t-test with Welch correction, p 0.05, nctrl ¼ 7, nmodel ¼ 8, error bars are standard deviation). (d) DSS-induced colitis in mice. 0% DSS v. 1.5% DSS. Day 5 shows a significant drop in body weight compared to control animals (t-test, p 0.05, nctrl ¼ 7, nmodel ¼ 8, error bars are standard deviation). Two animals dropped below the 20% body weight threshold and had to be euthanized. (e) Chemical status epilepticus in rats resulted in a rapid significant weight loss. This loss can be seen one day after SE compared to electrode-implanted sham rats without induced SE (t-test, p 0.0001, nctrl ¼ 12, nmodel ¼ 15, error bars are standard deviation). (f) Intracranial glioma model. Change in body weight (%) of rats with intracranial tumor 8 days before perfusion (P). Body weight on day 8 before perfusion was set as 100%. Data are shown as mean standard deviation. Significant differences of days compared to day 8 are indicated with an asterisk (one-tailed, one-sample t-test, p 0.01, n ¼ 10). Talbot et al. 103 month onwards (Figure 1(b)). This weight gain was not material. Regardless of the treatment of the resection caused by reduced hyperglycemia, since the mean blood surface, a minor transient body weight loss was glucose concentration in these mice did increase from observed in the early post-surgical phase in the majority 21.8 mmol/l (standard deviation (SD) 3.4) on day 61 to of animals (Figure 2(a)). None of the animals lost more 24.9 mmol/l on day 89 (SD 4.3) and further slightly than 20% of the body weight. On POD8 animals exceed increased until day 113. their initial operation weight.

DSS-induced colitis in mice. DSS-induced colitis rep- Liver resection in rats. During partial resection of resents a widely applied model of intestinal inflamma- the liver a recently developed glue (VIVO-100) was tion. Oral administration of 1% DSS via drinking tested in comparison with fibrin for adhesion of the water remained without relevant impact on body resection surface (Figure 2(b)). In control animals, weight (Figure 1(c)). In contrast, mice exposed to saline was administered instead of the adhesive mater- 1.5% DSS in the drinking water for five consecutive ial. Regardless of the treatment of the resection surface days showed a pronounced drop in body weight. The animals exhibited only a slight drop in body weight weight loss exceeded 20% in two animals (2 out of 8), following surgery which did not exceed 20%. In which were then euthanized. These animals had a clin- another experiment a 50% liver resection was per- ical score of 0 and 3 on a scale from 0 to 17 (for details formed on rats with a sealing of the resection site of the clinical score: see online Supplementary file 3). with PUG, fibrin or saline. Regardless of the groups, Five days following the end of oral DSS administration animals exhibit a significant weight loss on POD1–3 animals started to recover. Mice reached a body weight compared to their operation weight (Figure 2(c)). This that did not differ significantly from control animals at was followed by an increase of weight at POD4 exceed- day nine (following the termination of DSS exposure ing the initial weight. (Figure 1(d)). Kindling model with Celastrol administration Chemical SE model in rats. Pilocarpine is frequently in mice. The kindling model with once daily seizure used to induce a SE in laboratory rodents. SE with induction represents a frequently used model of tem- administration of the anticonvulsant diazepam 90 min poral lobe epilepsy. The consequences of genetic after onset of SE resulted in a rapid weight loss. and pharmacological targeting of an inducible heat One day following SE the body weight proved to be sig- shock protein were assessed in this paradigm. Daily nificantly reduced as compared to naı¨ ve rats and elec- injections of Celastrol significantly lowered overall trode-implanted rats without induction of SE (sham) body weight in wild-type mice and HSPA1 knockout (Figure 1(e)). However, during the next two days the mice in the kindling model of temporal lobe epilepsy animal regained the lost weight. Subsequently, a higher (Figure 3(a) and (b)). The treatment protocol has weight gain was evident in SE-exposed animals result- been slightly adjusted introducing two interim ing in a mean body weight exceeding that in both con- phases without treatment in order to avoid a too pro- trol groups 14 and 21 days following SE (Figure 1(e)). nounced weight loss. During these phases without drug administration, animals regained weight Intracranial rat glioma model. After initial surgery for (Figure 3(a) and (b)). stereotaxic injection of BT4Ca glioma cells rats were in good health condition until shortly before finalizing the Growth curves: impact of restrictive feeding experiment with a mean survival time of 16 days. and of a genetic deficiency A minor loss of body weight (mean weight loss of 2.1% compared to the previous day), together with a Restrictive feeding. In order to assess beneficial slight deterioration of the general health condition was effects on cognitive performance, CR feeding was found about 2 days before reaching endpoint criterion. initiated at an age of four weeks in C57BL6/J This was followed by a severe deterioration of the (Figure 4(a)) mice as well as ApoE-/- mice (Figure clinical score and more pronounced weight loss (mean 4(b)). In comparison with mice fed AL, the body weight loss of 5.2% compared to the previous day) on weight of mice with CR feeding proved to be reduced the following day (Figure 1(f)). throughout the study by more than 20%. (Figure 4(a) and (b)). However, even mice fed with a CR diet gained Pancreatic resection in rats. In a rat pancreatic body weight. resection model the resection surface was sealed with three different synthetic tissue glues or fibrin. In control Dravet syndrome in mice. The Dravet syndrome is a animals, saline was administered instead of the adhesive rare genetic epileptic encephalopathy characterized by 104 Laboratory Animals 54(1)

Figure 2. Surgery. (a) Pancreatic resection model in rats. Following a left sided pancreatic resection in Wistar rats the resected area was non sealed (NaCl group) or sealed either with a fibrin glue (TISSEEL) or polyurethane-based glues (PUG) (AM 110, 115 or 107). The animals experience a rapid weight loss up to day 4. After this time point the animals recovered and the weight gain to preoperative values was reached at day 7. Significant differences (*) indicate that data points are different from zero (one-tailed t-test, p 0.01, n ¼ 3–9, error bars are standard deviation, n ¼ 4). (b) 50% liver resection and com- parison of recently developed glue. Following a 50% liver resection in Wistar rats, the resected area was non sealed (NaCl group) or sealed either with a fibrin glue (TISSEEL) or PUG (Vivo 100). The animals experience a rapid weight loss in the first 3 post-operative days. This weight loss is significant different to the pre-operative weight. A one-sample t-test was used in testing the samples whether they differed from zero body weight change (%). After day 0 all samples were significantly different from zero (one-tailed t-test, p 0.01, n ¼ 3–9, error bars are standard deviation). (c) 50% liver resection and severity assessment. 50% liver resection in Wistar rats has been performed. The resected area was either glued with fibrin, PUG or rinsed with NaCl. The animals experience a rapid weight loss in the first 3 post-operative days. This weight loss is significantly different to the pre-operative weight. After this time point the animals recovered and the weight gain to preoperative values was reached at day 4. A one-sample t-test was used to test for significant differences to zero body weight change (%). On day 1–3 values are significantly different from zero (one-tailed t-test, p 0.05 n ¼ 21, error bars are standard deviation).

difficult-to-treat seizures as well as cognitive and motor a significantly increased weight gain from day 5 until impairment.18 In a knockin Dravet mouse model, a day 27 post weaning (Figure 4(c)). As a consequence, lower body weight was evident at the time of weaning the body weight of Dravet mice reaches a comparable (data not shown). This is compensated over time as range to wild-type mice approximately two weeks fol- Dravet mice exhibit a steeper weight gain curve with lowing weaning (data not shown). Talbot et al. 105

Figure 3. Drug treatment. (a) Kindling model with celastrol administration in wild-type mice. Daily injections of celastrol significantly lowered overall body weight in wild-type mice (WT) and HSPA1 knockout mice (KO) in the kindling model of temporal lobe epilepsy (t-test, p 0.01 from day 1 to 17, nctrl ¼ 8, nmodel ¼ 9, error bars are standard deviation). (b) Kindling model with celastrol administration in knockout mice (t-test, p 0.05 from day 1 to 13 and 17, nctrl ¼ 11, nmodel ¼ 14, error bars are standard deviation).

Discussion Weight loss as a consequence of restrictive Body weight as a severity assessment feeding, malabsorption, metabolic altera- parameter: simple to assess but tions, or drug effects on appetite control difficult to interpret The question about the level of weight loss, which At first glance body weight seems to be a simple par- should be considered as a burden itself, is for instance ameter, which is easy to assess in an objective manner of relevance for experimental studies, in which a and can provide information for severity assessment lowered body weight is a consequence of caloric scoring systems and decision about humane endpoints. restriction, changes in metabolism, or malabsorption. However, interpretation of body weight loss in the It needs to be taken into account that the majority of context of severity assessment also constitutes a chal- experimental animals receive food AL. Interestingly, lenge that needs to take into account that appetite, caloric restriction can exert beneficial effects indicating food intake and body weight development are regulated overfeeding and overweight as a consequence of AL in a very complex manner.19 The interpretation is feeding.21–26 On the other hand, detrimental conse- further complicated by the fact that weight loss can quences such as an aggravation in age-related impair- be a symptom of various conditions associated with ment of activity and anxiety-associated behaviour are reduced appetite, metabolic alterations, increased also possible.21 energy expenditure, or malabsorption. With the CR feeding protocols applied in the present As further discussed below there are several reasons studies, none of the animals lost more than 20% of why weight loss is considered as a parameter for sever- their starting weight. Since the animals are still in ity assessment and humane endpoint. Weight loss the growth phase when initiating CR feeding, it is, how- can on one hand reflect decreased appetite as a conse- ever, better to compare the body weight of these mice to quence of distress, fear and pain.20 On the other hand, the body weight of mice fed AL. However, a decision it can also indicate progression of a chronic disease about humane endpoint should also not be based exclu- reflecting deterioration with an increasing burden for sively on a 20% difference in body weight. More the animals. Moreover, weight loss related to different research is needed for an evidence-based definition of reasons can indicate a state of starvation, which if limits of tolerable weight differences while considering reaching a specific level can directly cause distress in benefits of caloric restriction and other readout param- affected animals. eters measuring distress. 106 Laboratory Animals 54(1)

Figure 4. Impact of restrictive feeding on body weight. (a) Impact of restrictive feeding on C57Bl/6 J mice. Animals that were fed in a caloric restrictive (CR) manner had significantly less body weight compared to mice fed ad libitum (AL). There is a significant difference in daily body weight ranging from week 4 to 74 (Wilcoxon rank sum test, * p 0.05, nAL ¼ 10, nCR ¼ 10, three mice fed AL died, no mouse fed in a CR manner died, error bars are standard deviation). (b) Impact of restrictive feeding on ApoE-/- mice. A Wilcoxon rank sum test showed significant differences for weeks 4–60 (* p 0.05, nAL ¼ 10, nCR ¼ 10, error bars are standard deviation). (c) Dravet syndrome in mice. Animals with Dravet syndrome showed lower body weight at the time of weaning, but increased weight gain following the weaning (day 0). Dravet animals showed significantly increased weight gain over wild-type animals on indicated days (* p 0.05, nWT ¼ 10, ndravet ¼ 10, error bars are standard deviation).

As mentioned above, the same question applies observation has been made after induction of diabetes for models in which weight loss is related to metabolic with streptozocin. Several mice lost more than 20% of alterations or malabsorption. In the DSS colitis model their body weight within two months without a deteri- weight loss was observed, which proved to be slight and oration of the clinical state or signs of distress or pain transient in response to 1% DSS but exceeded 20% (for parameters see online Supplementary file 3). in some of the animals responding to 1.5% DSS. In addition, after the first two months of diabetes the In affected animals, we did not observe an association body weight increased again. In respective situations a with a severe deterioration of the clinical state with strict application of 20% weight loss as a criterion for bloody diarrhea being the only other symptom. Thus, euthanasia might result in unnecessary loss of animals despite the weight loss the impairment of the clinical with the consequence of higher animal numbers neces- state of the affected animals remained on a mild level, sary for the experiment. whereas wheel running performance decreased to Another example, which raises the question about nearly 0%, consistent with weight loss. A very similar the relevance of weight loss for severity, are drug effects Talbot et al. 107 on appetite regulation or energy expenditure.27 for discontinuation decisions.30,31 In this context, A respective effect has been observed with Celastrol, body-weight reduction has been confirmed as a helpful which was administered to modulate neuroinflamma- parameter in different experimental scenarios.9,32,33 tory responses in a chronic epilepsy model. Previous Following the same concept, our analysis in an intra- studies reported that Celastrol causes increased energy cranial glioma model revealed that body weight drop expenditure.28 In our experiments, Celastrol-treated within 1–2 days, but not exceeding 20% is a reliable mice lost weight without persistent additional clinical predictor of clinical deterioration. Please note that symptoms except from acute adverse effects in the more specific markers for endpoint determination will hours following administration. Thus, the data indicate be discussed in detail in a separate publication. that the effect was rather related to drug-induced meta- bolic alterations. Considering strict regulations apply- Considerations for growth curves ing for this experiment, we implemented treatment-free in young animals intervals to avoid weight loss exceeding 15%. The experience during this study again raises the question, The assessment of growth curves during development which level of weight loss should be considered as a of young animals also requires specific considerations. burden for the animals? These are of particular relevance for severity assess- ment in genetically modified animals. The weight devel- Considerations for interventions opment of Dravet mice constitutes an interesting with transient weight loss example with a reduced body weight evident at the time point of weaning, but more pronounced weight Specific considerations also seem to be necessary for gain following weaning. The reduced weight gain at interventions, which are associated with a transient the time of weaning might be related to maternal neg- weight loss. This has been observed in animals follow- lect of affected animals. The strong weight gain follow- ing partial liver or pancreas resection during the early ing weaning indicates that the lower body weight is in phase of restraint stress as well as following SE induc- this case not a result of inappetence or illness-asso- tion. While the excess of 20% weight loss was an excep- ciated weight reduction. Our data, therefore, suggest tion in our experiments, it is nevertheless debatable that a transiently lowered body weight during a specific whether animals should be euthanized related to a developmental phase should always be evaluated in drop in body weight, which is expected to be transient the context of other clinical signs, since loss of ani- with a high level of certainty. As discussed above, mals due to euthanasia can imply the need for higher losing animals as a consequence would imply the need animal numbers. for higher animal numbers. In this context, a transient 20% weight loss in the absence of other clinical symp- Conclusions and future recommendations toms should be considered a questionable humane endpoint considering the overarching 3Rs objectives. In conclusion, the fact that weight loss can be related to different reasons and that weight loss can take a different Weight loss as a parameter course, strongly suggests that experiment and model reflecting distress specific considerations are necessary for the rational inte- gration of the parameter ‘weight loss’ in severity assess- It is well known that pathophysiological factors related ment schemes and humane endpoint criteria. For to distress, fear and pain can exert a strong impact on example, a weight loss of 20% as sole criterion for appetite thereby causing weight loss.20,29 Our findings euthanasia would lead to the premature death of diabetic further confirmed the impact of distress on body-weight animals. In the intracranial glioma model, however, a development with restraint stress causing a transient body weight loss less than 20% is already a reliable pre- drop in body weight during the early exposure phase. dictor of clinical deterioration. Thus, each animal model Thus, the present data further support the relevance is unique and requires tailor-made humane end-points. of thorough body-weight monitoring for severity In this context, it is important to consider that a assessment in animals, which are exposed to stressful less pronounced reduction of weight loss as a conse- procedures. quence of different interventions indicates that weight loss of less than 20% should be considered in clinical Weight loss as an early marker for scoring sheets with consequences depending on the discontinuation animal model. This study, therefore, suggests that the decision for In models with progression of the disease, there is euthanasia should not be based solely on an arbitrary an urgent need for early humane endpoints markers percentage of body-weight change, but should always 108 Laboratory Animals 54(1) consider other parameters indicating pain or distress 5. Expert Work Group (EWG) for the assessment of sever- and also animal model specific considerations. ity of procedures. Caring for animals: aiming for better science, http://ec.europa.eu/environment/chemicals/lab_ Acknowledgements animals/pdf/guidance/severity/en.pdf (accessed 12 February 2019). We thank Ilona Klamfuß (Rudolf-Zenker-Institute of 6. Pohjanvirta R, Tuomisto J, Vartiainen T, et al. Han/ Experimental Surgery, Rostock University Medical Center) Wistar Rats are Exceptionally Resistant to TCDD. and Zhiqun Wu (Department of Neurosurgery, Hannover I. Pharmacol Toxicol 1987; 60: 145–150. Medical School) for excellent technical assistance. We 7. Cappon GD, Fleeman TL, Chapin RE, et al. Effects of (Institute of Pharmacology, Toxicology, and Pharmacy, feed restriction during organogenesis on embryo-fetal Ludwig-Maximilians-University, Munich) thank Uwe Birett, development in rabbit. Birth Defects Res Part B - Dev Sabine Vican, Katharina Gabriel, Katharina Scho¨ nhoff, Reprod Toxicol 2005; 74: 424–430. Sarah Driebusch, Sieglinde Fischlein, Tamara Lindemann, 8. Ding A and Wen X. Dandelion root extract protects Carmen Meyer, Sabine Saß, and Claudia Siegl for their excel- NCM460 colonic cells and relieves experimental mouse lent technical assistance. The study is properly described and colitis. J Nat Med 2018; 72: 857–866. the appropriate reporting guidelines have been followed. The 9. Ott B, Dahlke C, Meller K, et al. Implementation of a methods used were appropriate and are described fully, the manual for working with wobbler mice and criteria for results are presented clearly and the conclusions are sup- discontinuation of the experiment. Ann Anat 2015; 200: ported by the results. Also, any relevant ethical approval 118–124. and consents have been obtained and included in the paper. 10. Franco NH, Correia-Neves M and Olsson IAS. How ‘Humane’ is your endpoint? Refining the science-driven Declaration of Conflicting Interests approach for termination of animal studies of chronic The author(s) declared no potential conflicts of interest with infection. PLoS Pathog 2012; 8: e1002399. respect to the research, authorship, and/or publication of this 11. Ha¨ ger C, Keubler LM, Talbot SR, et al. Running in the article. wheel: defining individual severity levels in mice. PLoS Biol 2018; 16: e2006159. Funding 12. Glien M, Brandt C, Potschka H, et al. Repeated low-dose treatment of rats with pilocarpine: low mortality but high The author(s) disclosed receipt of the following financial sup- proportion of rats developing epilepsy. Epilepsy Res 2001; port for the research, authorship and/or publication of this 46: 111–119. article: Funding information on each study is given in online 13. Wu Z, Nakamura M, Krauss JK, et al. Intracranial rat Supplementary file 1. glioma model for tumor resection and local treatment. J Neurosci Methods 2018; 299: 1–7. ORCID iDs 14. von Ru¨ den EL, Jafari M, Bogdanovic RM, et al. Analysis Steven R Talbot https://orcid.org/0000-0002-9062-4065 in conditional cannabinoid 1 receptor-knockout mice Andre Bleich https://orcid.org/0000-0002-3438-0254 reveals neuronal subpopulation-specific effects on epilep- Christine Ha¨ ger https://orcid.org/0000-0002-6971-9780 togenesis in the kindling paradigm. Neurobiol Dis 2015; Rene Tolba https://orcid.org/0000-0002-0383-3994 73: 334–347. Heidrun Potschka https://orcid.org/0000-0003-1506-0252 15. R Core Team. R: A language and environment for stat- Dietmar Zechner https://orcid.org/0000-0002-2075-7540 istical computing., http://www.r-project.org (2019). 16. Wickham, H and Bryan J. readxl: Read Excel files. Supplemental Material R package version 0.1, https://cran.r-project.org/pack- age¼readxl (2019). Supplemental material is available for this article online. 17. Wickham H. Reshaping data with the reshape Package. J Stat Softw 2007; 21: 1–20. References 18. Wallace A, Wirrell E and Kenney-Jung DL. 1. Hackam DG and Redelmeier DA. Translation of research Pharmacotherapy for Dravet Syndrome. Pediatr Drugs evidence from animals to humans. JAMA 2006; 296: 2016; 18: 197–208. 1731–1732. 19. Heisler LK and Lam DD. An appetite for life: brain regu- 2. Morton DB and Griffiths PH. Guidelines on the recogni- lation of hunger and satiety. Curr Opin Pharmacol 2017; tion of pain, distress and discomfort in experimental ani- 37: 100–106. mals and an hypothesis for assessment. Vet Rec 1985; 116: 20. Andermann ML and Lowell BB. Toward a wiring dia- 431–436. gram understanding of appetite control. Neuron 2017; 95: 3. Morton DB. A systematic approach for establishing 757–778. humane endpoints. ILAR J 2000; 41: 80–86. 21. Kuhla A, Lange S, Holzmann C, et al. Lifelong caloric 4. Ricceri L and Vitale A. The law through the eye of a restriction increases working memory in mice. PLoS One needle. EMBO Rep 2011; 12: 637–640. 2013; 8: e68778. Talbot et al. 109

22. Ru¨ hlmann C, Wo¨ lk T, Blu¨ mel T, et al. Long-term caloric 28. Ma X, Xu L, Alberobello AT, et al. Celastrol protects restriction in ApoE-deficient mice results in neuroprotec- against obesity and metabolic dysfunction through acti- tion via Fgf21-induced AMPK/mTOR pathway. Aging vation of a HSF1-PGC1a transcriptional axis. Cell Metab (Albany NY) 2016; 8: 2777–2789. 2015; 22: 695–708. 23. Kuhla A, Hahn S, Butschkau A, et al. Lifelong caloric 29. Monteiro S, Roque S, de Sa´ -Calc¸ada D, et al. An efficient restriction reprograms hepatic fat metabolism in mice. chronic unpredictable stress protocol to induce stress- J Gerontol ABiol Sci Med Sci 2014; 69: 915–922. related responses in C57BL/6 mice. Front Psychiatry 24. Gillette-Guyonnet S and Vellas B. Caloric restriction and 2015; 6: 1–11. brain function. Curr Opin Clin Nutr Metab Care 2008; 11: 30. Ashall V and Millar K. An opportunity to refocus on the 686–692. ‘humane’ in experimental endpoints: moving beyond 25. Mantis JG, Centeno NA, Todorova MT, et al. Directive 2010/63/EU. ATLA Altern to Lab Anim 2013; Management of multifactorial idiopathic epilepsy 41(4): 307–312. in EL mice with caloric restriction and the ketogenic 31. Ashall V and Millar K. Endpoint matrix: a conceptual diet: Role of glucose and ketone bodies. Nutr Metab tool to promote consideration of the multiple dimensions 2004; 1: 11. of humane endpoints. ALTEX 2014; 31: 209–213. 26. Li L, Sawashita J, Ding X, et al. Caloric restriction 32. Roughan JV, Coulter CA, Flecknell PA, et al. The reduces the systemic progression of mouse AApoAII conditioned place preference test for assessing welfare amyloidosis. PLoS One 2017; 12: e0172402. consequences and potential refinements in a mouse blad- 27. Rogers RC, McDougal DH and Hermann GE. der cancer model. PLoS One 2014; 9: 1–11. Hindbrain astrocyte glucodetectors and counterregula- 33. Hankenson FC, Ruskoski N, Saun M Van, et al. Weight tion. In: Harris RBS (ed.) Appetite and Food Intake: loss and reduced body temperature determine humane Central control, 2nd edn. Boca Raton, FL: CRC Press/ endpoints in a mouse model of ocular herpesvirus infec- Taylor & Francis, pp.205–228. tion. J Am Assoc Lab Anim Sci 2013; 52: 277–285.

Re´sume´ Dans de nombreuses expe´rimentations animales, les scientifiques et les autorite´s locales de´finissent une re´duction du poids de 20 % ou plus comme source de graves souffrances et par conse´quent comme un parame`tre e´thique potentiel pour les de´cisions relatives a` l’euthanasie. Dans cette e´tude, nous avons e´value´ des expe´riences animales distinctes dans de nombreuses installations de recherche, et e´value´ si une re´duc- tion de 20 % du poids e´tait un crite`re d’euthanasie valide chez les rongeurs. Dans la plupart des expe´riences (stress lie´ aux dispositifs de retenue, mode`les distincts pour l’e´pilepsie, re´section pancre´atique, re´section du foie, alimentation limite´e en calorie et un mode`le de souris pour le syndrome de Dravet) les animaux ont perdu moins de 20 % de leur poids corporel. Dans un mode`le de gliome, une de´te´rioration rapide du poids de moins de 20 % a e´te´ observe´e et conside´re´e comme un facteur de pre´diction fiable de de´te´rioration clinique. En revanche, apre`s l’induction d’un diabe`te chronique ou d’une colite aigue¨, certains animaux ont perdu plus de 20 % de leur poids corporel sans exposer les principaux signes de de´tresse. Dans ces deux mode`les animaux, une application exclusive du crite`re de perte de poids de 20 % pour de´cider de l’euthanasie pourrait donc entraıˆner une perte inutile d’animaux. Cependant, nous avons e´galement confirme´ que ce crite`re peut eˆtre un parame`tre valide pour de´finir le crite`re e´thique d’euthanasie dans d’autres mode`les animaux, en particulier lorsqu’il est combine´ avec d’autres crite`res d’e´valuation de la de´tresse. En conclusion, nos re´sul- tats sugge`rent fortement que des facteurs spe´cifiques lie´sa` l’expe´rience et au mode`le sont des e´le´ments qu’il faut ne´cessairement prendre en compte pour l’inte´gration rationnelle du parame`tre de « perte de poids » dans les syste`mes d’e´valuation de la gravite´ et les crite`res d’euthanasie. Une mise en œuvre souple adapte´ea` l’expe´rience ou a` l’intervention par les scientifiques et les autorite´s est donc fortement recommande´e.

Abstract In vielen Tierversuchen definieren Wissenschaftler und Lokalbeho¨rden einen Gewichtsverlust von 20 % oder mehr als schweres Leiden und damit als mo¨glichen Parameter fu¨r humane Endpunktentscheidungen. In der vorliegenden Studie haben wir verschiedene Tierversuche in unterschiedlichen Forschungseinrichtungen ausgewertet und untersucht, ob eine Gewichtsreduktion von 20 % ein valides humanes Endpunktkriterium bei Nagetieren ist. In den meisten Experimenten (Ruhigstellungsstress, verschiedene Modelle fu¨r Epilepsie, Pankreaseresektion, Leberresektion, kalorienbeschra¨nkte Erna¨hrung und ein Mausmodell fu¨r das Dravet- Syndrom) verloren die Tiere weniger als 20 % ihres urspru¨nglichen Ko¨rpergewichts. In einem Gliommodell wurde eine schnelle Abnahme des Ko¨rpergewichts von weniger als 20 % als zuverla¨ssiger Indikator fu¨r die klinische Verschlechterung beobachtet. Demgegenu¨ber haben einige Tiere nach Induktion von chronischem 110 Laboratory Animals 54(1)

Diabetes oder akuter Kolitis mehr als 20 % ihres Ko¨rpergewichts verloren, ohne gro¨ßere Belastungsanzeichen zu zeigen. In diesen beiden Tiermodellen du¨rfte eine ausschließliche Anwendung des 20 %-Gewichtsabnahmekriteriums fu¨r die To¨tung daher zu einem unno¨tigen Verlust von Tieren fu¨hren. Wir haben aber auch besta¨tigt, dass dieses Kriterium ein valider Parameter fu¨r die Definition des humanen Endpunktes in anderen Tiermodellen sein kann, insbesondere wenn es mit weiteren Kriterien zur Beurteilung von Belastung kombiniert wird. Zusammenfassend deuten unsere Ergebnisse stark darauf hin, dass ver- suchs- und modellspezifische Erwa¨gungen fu¨r eine sinnvolle Integration des Parameters ÐGewichtsverlust‘‘ in Schweregradbestimmungs-Systeme und humane Endpunktkriterien notwendig sind. Eine flexible, auf den Versuch oder die Intervention von Wissenschaftlern und Beho¨rden zugeschnittene Implementierung wird daher dringend empfohlen.

Resumen En muchos experimentos con animales, tanto los cientı´ficos como las autoridades locales definen una reduccio´n del peso corporal del 20 % o ma´s como un grave sufrimiento y, por tanto, como un para´metro potencial para decisiones de sacrificio humanitario. En este estudio evaluamos distintos experimentos con animales en instalaciones de investigacio´n y analizamos si una reduccio´n del 20 % del peso corporal es un criterio va´lido para llevar a cabo un sacrificio humanitario de los roedores. En la mayorı´a de experimentos (estre´s por retencio´n, distintos modelos para epilepsia, reseccio´n pancrea´tica, reseccio´n hepa´tica, alimenta- cio´n con restriccio´n calo´rica y un modelo de roedor para el sı´ndrome Dravet) los animales perdieron menos del 20 % de su peso corporal original. En un modelo de glioma, se observo´ un ra´pido deterioro del peso corporal de menos del 20 % como un indicador fiable de deterioro clı´nico. Por otro lado, despue´s de una induccio´n de diabetes cro´nica o colitis aguda, algunos animales perdieron ma´s del 20 % de su peso corporal sin mostrar ningu´n signo llamativo de molestia. En estos dos modelos de animales, una aplicacio´n estricta del criterio utilizacio´n de la eutanasia para animales con una pe´rdida del 20 % del peso corporal podrı´a conllevar en una pe´rdida innecesaria de animales. Sin embargo, tambie´n confirmamos que este criterio puede ser un para´metro va´lido para definir un sacrificio humanitario en otros modelos de animales, espe- cialmente cuando se combina con unos criterios adicionales de evaluacio´n del estre´s. En conclusio´n, nuestro estudio sugiere firmemente que son necesarias consideraciones especı´ficas para cada modelo y experimento para la integracio´n racional del para´metro de «pe´rdida de peso» a la hora de aplicarlas a los programas de evaluacio´n de gravedad y a los criterios de sacrificios humanitarios. Se recomienda, por tanto, una imple- mentacio ´n flexible segu´n el experimento y la intervencio´n por parte de los cientı´ficos y las autoridades. Expert in the world of research diets

Special Diets Services is the largest supplier of Laboratory Animal diets in Europe and the only dedicated manufacturer in the UK. Special Diets Services has a global reputation for the quality of its diets and manufacturing and storage facilities. Special Diets Services

Special Diets Services PO Box 705, Witham, Essex, England CM8 3AD Telephone: +44 (0) 1376 511260 SDS www.sdsdiets.com Fax: +44 (0) 1376 511247 the essential resource for quality research diets Email: [email protected]

SDS8222 SDS Corporate Ad LAJ 280x210mm.indd 1 19/01/2017 10:40

News Laboratory Animals 2020, Vol. 54(1) 113 ! The Author(s) 2019 Article reuse guidelines: Life cycle of a FELASA working group sagepub.com/journals-permissions DOI: 10.1177/0023677219897534 journals.sagepub.com/home/lan Jean-Philippe Mocho

Now that the concept is well defined and delimited, and objectives are detailed, the member associations nominate candidates for working group membership. Most working groups rely on six to eight members, including one convener. The selection of working group members and convener is made by the FELASA Executive Committee, based on expertise of the candidates and including wide representation of Over the years, FELASA has supported over 30 work- FELASA membership. ing groups, covering a wide range of topics, e.g. educa- The working group can now start; they have 2 years tion and training, ethics, husbandry and care, and to provide a first draft to FELASA. The internal health monitoring. Over the years, according to reviewing process is lengthy since all member associ- Scopus, the latter theme triggered well over 800 cit- ations are given 4–6 weeks to send comments, and a ations of FELASA recommendations. few rounds of comments are often required. At the end, All working groups start with a suggestion made by a final draft is sent to Laboratory Animals for Open a member of the FELASA Board of Management, Access CC BY publication. There are exceptions to Executive Committee, working group member, meeting the above process when the working group is a collab- attendee, colleagues ...anyone interested in LAS! oration with other associations. Suggestions can be sent to [email protected]. Publications of recommendations often terminate a The suggestion is then discussed by the Executive working group’s activities. However, all working Committee and the Board of Management, who has groups are asked to present their findings to a to approve the concept before it is developed further. FELASA triennial congress, so that feedback from Terms of Reference are then written by the proposer or the community is received, before or after publication. by volunteers from the committees. The Board of Very much looking forward to seeing you in Management votes for the final approval of the Marseille, http://www.felasa2022.eu/ to discuss work- Terms of Reference. ing group progress!

FELASA Honorary Secretary, FELASA, Eye, UK

Corresponding author: J-P Mocho, FELASA, PO Box 372, Eye, IP22 9BR, UK. Email: [email protected]

Contributions to the News section are not subject to peer review and reflect the opinion of your subscribed society. ),7+)*+1&C+&)*1'8+& !! 4,-5-0'-7&+:0435,-& !! .3)+()&-+C(&L&'-9,1;35,-& !! 1+21+(+-)35,-&3)&!M& !! ';21,8+:&3-';3.&C+.931+& !! 3:83-4'-7&(4'+-564&H-,C.+:7+&

!"#$%&'(&)*+&,-./&!01,2+3-&"4'+-564&$((,4'35,-&1+21+(+-5-7&& 3..&8+)+1'-31/&21,9+((',-3.(&'-&)*+&6+.:&,9&3-';3.(&0(+:&9,1& N+E'-31&"+1'+(& (4'+-564&2012,(+(! *<2=>>+(.38? +4.3;@,17>+:0435,-> """#$%&'()$*&'+#,-.!! +(.38?C+E'-31?(+1'+(>&

!"#$%&3458+./&1+21+(+-)(&')(&;,1+&)*3-&FGG&;+;E+1(&3(&3& ()3H+*,.:+1&3)&:'I+1+-)&.+8+.(&,9&'-)+1-35,-3.&:+4'(',-?;3H'-7& E,:'+(&.'H+&)*+&!01,2+3-&J,;;'((',-&3-:&$$$#$J&K-)+1-35,-3.&

A,'-&),:3/& B+)&;,1+&'-9,1;35,-&,-&*,C&),&D,'-&E/&8'('5-7& *<2=>>+(.38?+4.3;@,17>+(.38>;+;E+1(*'2>&

"0;;+1&"4*,,.&& *<2=>>+(.38?+4.3;@,17> +:0435,->+(.38?+4.3;? (0;;+1?(4*,,.>& News Laboratory Animals 2020, Vol. 54(1) 115–117 ! The Author(s) 2019 Cartilla dida´ctica ‘‘Mi experiencia con la Article reuse guidelines: sagepub.com/journals-permissions DOI: 10.1177/0023677219886436 ciencia’’: Una iniciativa de la SECAL para journals.sagepub.com/home/lan acercar a los nin˜os a la experimentacio´n cientı´fica

Herna´n Serna

La SECAL (Sociedad Espan˜ola para las Ciencias del Animal de Laboratorio) ha desarrollado un proyecto para elaborar material divulgativo sobre la experimen- tacio´n con animales dirigido a nin˜os en edad escolar. Uno de los principales objetivos de la SECAL es la promocio´n de iniciativas que generen un mayor cono- cimiento y comprensio´n en la sociedad sobre el uso de los animales en investigacio´n cientı´fica. Adema´s, es uno de los compromisos adquiridos en el Acuerdo de Transparencia sobre la utilizacio´n de animales en experimentacio´n y docencia en Espan˜a, que promovio´ nuestra sociedad en 2013. En septiembre de 2019 y gracias al apoyo inestim- able de Laboratory Animals Limited, se publico´la Cartilla Dida´ctica Nu´mero 2 ‘‘Mi experiencia con la ciencia’’.

Figure 2. Contraportada cartilla dida´ctica 2019/Back cover Educational pamphlet 2019.

El intere´s de SECAL y de LAL es conseguir su ma´xima distribucio´n y utilizacio´n, por lo que puede descargarse libremente en idioma espan˜ol y formato pdf en el siguiente link: https://secal.es/cartillas-mi- experiencia-con-la-ciencia/

Junta de Gobierno de la SECAL, Spain

Corresponding author: Herna´n Serna, Junta de Gobierno de la SECAL, Colegio Oficial de Veterinarios de Madrid (COLVEMA), C/ Maestro Ripoll n 8, 28006 Madrid, Spain. Email: [email protected] Figure 1. Portada cartilla dida´ctica 2019/ Cover Educational pamphlet 2019. Contributions to the News section are not subject to peer review and reflect the opinion of your subscribed society. 116 Laboratory Animals 54(1)

Nuestra idea es que el conocimiento acerca de lo que hacemos, debe ser transmitido al pu´blico en general, de manera que sea comprendido por todos ellos. El obje- tivo es acercar nuestro mundo a los ma´s pequen˜os de una manera sencilla para que, el dı´a de man˜ana, las palabras refinar, curar, reducir o reemplazar sean para ellos una parte ma´s de nuestra actividad. Buscamos difundir la importancia de la experimen- tacio´n animal en el avance de la ciencia biome´dica al pu´blico de temprana edad por medio de dida´cticos y divertidos juegos. El material divulgativo esta´lleno de dibujos y acti- vidades con las que los nin˜os pueden aprender jugando. Se les explica, por ejemplo, que el pez cebra se utiliza en la bu´squeda de tratamientos para las enfermedades mientras dan forma a un pez dentro de una pecera, o que los animales y los humanos tienen mucho en comu´n, mientras averiguan que´animal esta´escondido en la silueta. Una de las actividades con ma´se´xito es la identifica- cio´n de la relacio´n entre la huella de un animal y el Figure 3. Activities about Danio rerio. texto que le relaciona con una enfermedad. Ası´ podra´n aprender, de una manera sencilla, que la rana contribuyo´al desarrollo de los conocidos broncodilata- dores para el asma o que el cerdo fue el responsable de refinar y mejorar los trasplantes renales. La idea es que este cuadernillo se pueda utilizar libremente en actividades infantiles de divulgacio´n cientı´fica en el a´mbito escolar o dome´stico. En un futuro, nos planteamos la traduccio´n de las cartillas para que su difusio´n sea au´n mayor.

Educational pamphlet ‘My Experience with Science’: A SECAL initiative to give children an insight into scientific experimentation SECAL (Spanish Society for Laboratory Animal Sciences) has developed a project to generate inform- ative material on experimentation with animals aimed at school-aged children. One of the main objectives of SECAL is to promote initiatives that generate greater knowledge and under- standing in society about the use of animals in scientific research. This is furthermore one of the commitments made in the Transparency Agreement on the use of Figure 4. Activities about word. animals in experimentation and teaching in Spain, pro- moted by our society in 2013. In September 2019, thanks to the invaluable therefore be downloaded for free in Spanish and in pdf support of Laboratory Animals Limited (LAL), the format via the following link: https://secal.es/tillas-mi- Educational Pamphlet #2, ‘My Experience with experiencia-con-la-ciencia/. Science’, was published. Our idea is that knowledge about what we do needs The goal of SECAL and LAL is to achieve max- to be conveyed to the general public in a way that imum distribution and use of the resource, which can achieves universal understanding. The aim is to bring Serna 117 our world closer to the youngest members of society in common, while discovering the animal hidden in sil- a straightforward manner, so that in the future words houette form. such as refine, cure, reduce and replace become for One of the most successful activities is identifying them an intrinsic part of what we do. the relationship between an animal’s tracks and the We seek to publicise the importance of animal text relating it to a disease. They learn in simple experimentation in the advancement of biomedical sci- terms, for example, that frogs contributed to the devel- ence among a younger audience through fun, educa- opment of the asthma inhalers we all know, or that tional games. The educational materials are full of pigs played a role in refining and improving kidney drawings and activities that allow children to learn transplants. through play. The idea is that the booklet can be freely used in They are told, for example, that zebra fish are used children’s activities for scientific teaching in school or in the search for cures for disease while tracing a fish in at home. We are considering translating the booklets in a fishbowl, or that animals and humans have a lot in the future to make them even more widely available. Essential online training for investigators, technicians, veterinarians, managers, and IACUC members in the laboratory animal science field.

240+ courses on animal research and compliance – courses customizable Translatable instantly to 100+ languages – website and courses Specialty certificate programs on animal biosafety and transgenics Training documentation and management

If your organization is an AALAS institutional or commercial member, you save even more! Learn more about AALAS membership: https://www.aalas.org/membership https://aalaslearninglibrary.org American Association for Laboratory Animal Science 9190 Crestwyn Hills Drive, Memphis, TN 38125-8538 (901) 754-8620 | (901) 753-0046 www.aalas.org | [email protected]

2018 ALL Full Page ad.indd 1 11/13/2019 11:05:20 AM Providing leadership in laboratory animal science and welfare in support of ethical and effective Laboratory Animal animal research Science Association The professional society for those with an interest in... Animal use • Care and Welfare Education and training • 3Rs • Ethics Animal research regulation and policy JOIN TODAY! Benefits of being a member: • Membership of a highly regarded and influential scientific association • Access to support for career development at all levels • The opportunity to serve on high level Sections and Task Forces charged with delivering scientific guidance and policy proposals • Preferential registration for the Annual Conference and other events • Access to Association bursaries, travel awards and CPD opportunities • Regular newsletter: ‘LASA Forum’

www.lasa.co.uk/join_us [email protected] 120...... Calendar of events

Meetings of interest to laboratory animals scientists and technicians: references to Laboratory Animals are for further details. Items for inclusion should be sent to Notes and Comments Editor, LAL, PO Box 373, Eye, Suffolk, IP22 9BS, UK. Email to [email protected]. The deadlines for inclusion of material are: February issue, 10 November; April issue, 10 January; June issue, 10 March; August issue, 10 May; October issue, 10 July; December issue, 10 September. 2020 13–17 January ESLAV ECLAM 2020 Winter School on Statistics and Experimental Design, Dublin, Ireland. For further information visit http://eslav-eclam.org/education/eslav-eclam- winter-school-on-systematic-reviews/statistics-and-experimental-design-13th-to-17th- of-january-2020/ 23 March RSPCA Focus on Fish 2020 meeting, Edinburgh, UK. For further information visit [email protected] 23–26 March Animal Science and Technology (AST) Conference 2020, Edinburgh, UK. For further information visit https://www.ast2020.org/ 31 March–3 April Microbiology Society Annual Conference 2020, Edinburgh, UK. For further information visit https://microbiologysociety.org/event/annual-conference/annual-conference- 2020.html 28-30 April Training School in Experimental Design and Statistical Analysis of Bioscience and Biomedical Experiments, Edinburgh, UK. For further information visit https://frame- trainingschool2020.eventbrite.co.uk 29 April LASA PELH Spring Forum, Birmingham, UK. For further information contact [email protected] 16–20 May ScandLAS 50th Symposium, Tallin, Estonia. For further information visit https://www. scandlas2020.ee/ 27–29 May AFSTAL Annual Conference, Marseille, France. For further information visit https:// www.colloque-afstal.com/2020/ 28 May 3R Symposium-Alternatives to CO2, Bern, Switzerland. For further information visit https://www.nc3rs.org.uk/events/3r-symposium-%E2%80%93-alternatives-co2 22–23 June Lasa animal science (transgenics) section workshop: cryopreservation and IVF technol- ogy. Venue TBA. For further information contact [email protected] 22–26 June FELASA Laboratory Animal Science Course on Primates, Gottingen, Germany. For further information visit https://www.nc3rs.org.uk/sites/default/files/documents/ LASCourse_Registration_Juni2020_form.pdf 23–27 August 11th World Congress on Alternatives and Animal Use in the Life Sciences, Maastricht, The Netherlands. For further information see http://wc11maastricht.org/ 16–18 September ANZLAA, Brisbane, Australia. For further information see http://www.anzlaa.org/ 16–18 September GV-SOLAS, Wo¨rzburg, Germany. For further information see http://www.gv-solas. de/ 25–29 October AAALAS National meeting, Charlotte, NC. Further information to follow.

Index to Advertisers FEBRUARY 2020

AALAS 14, 118 ESLAV 114 Special Diets Services 111 Allentown Europe 9 ssniff Spezialdiðten GmbH 7 Altromin International OBC Fine Science Tools GmbH 4 Tecniplast SpA IBC AnLab Ltd 112 Granovit AG / Kliba Nafag 3 AST2020 8 ZOONLAB 15 Laboratory Animals Ltd (LAL) 12 CEDS 112 LASA 119 Charles River Laboratories IFC LBS 11

Datesand Ltd 13 PFI Systems Ltd 10

Your global operating experts in lab animal diets.

$VDQΖ62FHUWLȴHGFRPSDQ\ ZLWKRYHU\HDUVRIH[SHULHQFH ZHR΍HU\RX VWDQGDUGL]HGDQLPDOQXWULWLRQDOVROXWLRQV DQGFXVWRPL]HGVSHFLDOGLHWV

altromin Spezialfutter GmbH & Co. KG Im Seelenkamp 20 D- 32791 Lage Made in Germany www.altromin.com International