Determinants of host species range in plant Benoît Moury, Frédéric Fabre, Eugénie Hébrard, Rémy Froissart

To cite this version:

Benoît Moury, Frédéric Fabre, Eugénie Hébrard, Rémy Froissart. Determinants of host species range in plant viruses. Journal of General Virology, Microbiology Society, 2017, 98 (4), pp.862-873. ￿10.1099/jgv.0.000742￿. ￿hal-01522298￿

HAL Id: hal-01522298 https://hal.archives-ouvertes.fr/hal-01522298 Submitted on 26 May 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution - ShareAlike| 4.0 International License Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: breadth, range host relHRB:breadth. relative Non bipartite Keywords: author: Corresponding R &Ecologie, Contrôle»,Génétique, Evolution Montpellier CNRS UMR5290, (5) parasites», CampusinternationaldeBaillarguet,F INRA UMR385, (4) IRD Environnement", France Montpellier UMR186, (3) des Sciences la de Vigneet du Vin, F (2) (1) Pathologie Végétale, Moury,Fabre,(1), B. F. Hébrard, Froissart,(2), (4,5) E. R. (3), Determinants ofhost species range plantin viruses Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). unning titleunning UMR 1065 UMR - standard abbreviations:standard network

plant , host range, plant virus,host :

, Santé et Agroécologie du Vignoble, INRA, Bordeaux Sciences Agro, Bordeaux INRA, Vignoble,du et Agroécologie Santé , Determinants species ofhost inplant viruses range

- - CIRAD - Cirad Comment citer cedocument: IRD

INRA, B. Moury, 10.1099/jgv.0.000742 - - UM1

M Lbrtr "neatos lne Microorganismes Plantes "Interactions Laboratory UM,

- HRB: range host breadth,absoluterange absHRB: host uAr, aoaoy Booi ds Interacti des «Biologie Laboratory SupAgro, 84140 Montfavet, France 84140 Montfavet, -

UM2, Laboratory «Maladies Infectieuses & Vecteurs : : Vecteurs & Infectieuses «Maladies Laboratory UM2, genome vertical segmentation, transmission,

E

- 33883 33883 Villenave d - mail: mail: [email protected]

1

- 34398 Montpellier, France ’ Ornon, France Ornon,

France

vector, n plantes ons

Institut - Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: jumps patterngeneralistviruses. due to patterngiven mainly specializedplantsofa toinfect toviruses familya due nested and Accordingly, theplant range, withahighlysignificant toinfection barriers families. increase between of plant From theplant side,thefamily taxonomi with adouble single v with single strongly database, traits found related we that four tovirus (HRB), their emergence istightly linked to propensity. pathogens toinfect set ofhostspecies, a large anthrop agronomy. studiesofpathogen Most emergence focused have onin Prediction ofpathogen field emergence isanimportant Abstract iruses Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). - and stranded stranded . ic factorsontherolepathogen ratherthan of properties. intrinsic Also, t and robustlyand

emergence - stranded stranded - stranded genomestranded (almost exclusivelyextremelyRNA) an had small HRB. wo contrasted groups seed of genome hadgenome genome linked to linked risks. - virus infectivity matrix shows infectivity avirus dual amodular matrix pattern: structure Comment citer cedocument: 10.1099/jgv.0.000742

larger HRB thannon s,

those virus These results contribute toa These contribute results

HRB with three genomewith three segments c rankcritical appearedas for a virushost threshold . Broader hostrangeswere observed for viruses - tra i.e. 2 nsmitted viruses wereevidenced.nsmitted witha Those

- to possessa largeto range host breadth genome seed - Usingan plantvirus extensive transmitted viruses, whereas viruses, those transmitted of research, humanhealth bothin and

o r transmission better pred

and nematode volved ecologicalvolved or iction of iction The capacityThe of properties - transmitted virus were

host host Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: the extreme contrasts host range oftheir breadth Notwithstanding However, ha “potential”range species ofplant host parasites technically orethicallyfeasible. animal orhuman for pathogens, which bya controlled new inoculation pathogen infectionhost wouldhelp compare to emergenceamong risksof pathogens. different a (ii) of jump, host theyare environmental variables and influenced byhumanactivities multiple on theabov trade, oftheexpansion geographical or rangeof modification introduction ofinnew areasglobal landuse, pathogens travel through or focused on (iii) propagation in species. jumps Incase ofanimal humandiseases, the and topathogen particular attention has paid been incidence Pathogenemergencecausative processwhich increases the ofa isthe agent by in disease Introduction nd Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). ve pathogen

been obtainedbeen ” , I

nfectivity of a pathogen in agivencan ofa difficult evaluate,nfectivity to pathogen speciesbe especially Host jump where has been used for decadesfor has ascriterion used a been taxonomic following evolution of involved e - mentioned steps (i) andmentioned (ii

properti

a infecting pathogen a generates reservoirepidemics disease host and inanew ,

i.e. host host s that by controlled inoculation by controlled

its appearanceits involve three major steps: (i) encounterinvolve three major steps: (i)

ecological ecological infection of thenew determined host,is es species plant virus host plant virushost

Comment citer cedocument: host population host . As a consequence,. identifying 10.1099/jgv.0.000742 jumps jumps and

In contrast, in anew

anthrop arecertainlynot parasites range i), and t [3]. [3]. ic , or in a previously, orin existing Most Most studies ofpathogen emergence have determination of thespeciesdetermination h is experiments factors 3

heir effectsheir are poorly known

(as opposedt (as

vectors. (HRB ,

such as infrequent a ), which which

from from Thesearemostly factors influential , especially for viruses of th of and o their “realized” range) host

to a large extent to alarge extent

increase in human population, increase in human ( a tomore singlespecies than pathogen properties difficult anticipate to but see

extensive dataextensive e new host, mong plant viruses, given plant mong [5

host population host - 9] ost range ). (ii) infection

.

by intrinsic In contrast, step on [4] may

the determine .

of because

not plant

[1,2] “ for for host host be host host and

.

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: the factassayedplant species that the are not plant species and/or precision or whether (relHRB), ( eachdatabase, inthe virusspecies t putative explanatory variables In toidentify order between Relationships virus Results - - than others? - ( the plant pathogens and especially plantviruses emergence inhumans species 1 absHRB W A D VIDE database; 000 Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). re viruses likely to jump re likely viruses to jump o hat are the processes

certain types ofviruses ,

and the

(but see (but see hosts ornon hosts ),

corresponding to corresponding

the totalnumber as ofplant listed inthedatabase hosts species

accuracy issues

of plant virus emergence, plant virus we of incongruence general ofplant patterns [10,11]

[15] determinants (ii) - ) toanswerfollowing the questions: hosts [12] ). Comment citer cedocument: their distribution within plant within their distribution taxonomy

HRB . . 10.1099/jgv.0.000742

to plant species

Also, h

In addition, different were analyzed datasets of because absH between thephylogeny possess associatedwith differences in properties corresponding

was foundtobe

of plant HRB, virus we RB ost range isanimportant cause expansion ofemergence for wo HRB estimates wereHRB obtained estimates wo

the capacity toinfectrange abroaderof

divided - virus

and and analyzed belonging to

[13,14] interactions

by the total number ofby the equally to 4 HRB

a major determinanta or viral ofbacterial virus

. of

an range virushost extensive database In toimprove order evenly distributed across plantgenera across distributed evenly or genome most most

distant analyzed at the species

(i) plant the total number of assayedthe totalnumber or

taxa

between virus speciesbetween virus transmission transmission viruses

its ? assayed assayed

:

relationships relationships a rank bsolute HRB bsolute our knowledgeour of and and ,

and relative HRB ?

plant species

plant that of properties

species with potential potential their host

(

i.e . ,

For . Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: ambisense)(P RNA viruses differencethere was inthe nosignificant (3.6 12.6 plantspecies with adouble onaverage, thanviruses respectively) stranded a lower to extent, t properties suggestingmain axes, that Moreover, 4) explained redundancylimited between theexplanatory variables. entire MCA The ofthe twomainaxes Correspondence ( Analyses Methods section numbers oforder datasets of HRB Indeed,dataset species.293 comprising 480virus the whichthe 293viruswere ≥15plantspeciesthe species assayed to for inaddition families for all viruses) ransmission and the Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). andand 3.9speciesonaveragefor dsDN dsRNA

(480 species) and restricted (293 species) datasets, respectively,(480 restricted (293species) andspecies) datasets, The We estimates (Method for a diversity minimal which generafamilies, as ofplantorby assessed (ss) ,

absHRBand relHRB the analyze abs 16 genome DNA) (either composedRNA or of ir HRB % and

genome nature,genome ). 2

their d (

Hill ir of viruses of correlations the

17 vector typevector abiotic horizontal transmission horizontal modeabiotic . To tackl . Comment citer cedocument: gen % . 10.1099/jgv.0.000742 theyare

MCA S1 of the total va - and value= ) . was s Hill

To tackleissue, we restricted the analyzed second additional the

e the first issue, wee thefirst issue, used as illustrativevariableused s) (Fig. available intheonlineSupplementary S1 Material) accounted for

(variables SEG,VERand GEN, VEC, respectively)

0.097; Kruskal ir not redundantwith the explanatorynot variables highly

fam

between number of genome of segments, number the , respectively abs riation significantly HRB ofpositive HRB

the 5 2

for thesefor 3.0 explanatory - Wallis test). Wallis ) % analyzed [16] [16] - A viruses, respectively virus dataset a better provides precision and The (orders followingaxes 3and two

linked to linked had two has been( assayed 2 s 2.3 (Table

-

were

a restricteddataseta comprising variables a broader host rangeaand broader host (16.7 datasets and negative

% V four iruses of the total variationof the

poorly linked tothe two 1 ).

, respectively different viral - and ir

stranded stranded V using with three genomewith three

mode ofvertical mode iruses with iruses

revealed only - sense (or ). In contrast, ). Table S1

Multiple Multiple . (ds)

entire Hill .

a genome

and

single for the

and, to

. - Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: HRBsinglegenome thanviruses witha segment observed for and relHRB) F restricted virussets potential limiting than virusesabiotic transmission withnohorizontal ( transmission, neither vectors through (datan seeds nor range exclusively transmitted seeds, thanviruses exclusively through through or vectors, both vertically seeds through and horizontally a through biologicallargerhost had vectors underrepresentation average species onaverage) viruses correspondingother vector to types average) transmission, horizontal n groupsbetween ofseed two these had larger hostrange viruses transmitted vertically of throughtheseedcoat contamination or viruses transmitted through propagation vegetative had a broader hostrange (10.5 to15.7species onaverage) segments or Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). all

these restrictedthese The robustness ofthe l ) but than had a significantlyhad broader hostrange

did not depart significantlyvirus depart did not groups other from the and the explanatory variables SEG, VER and VECexplanatory and variablesSEG, the viruses transmitted bycontact plantshad a between other types ofviruses, types whereasother little entire . Thrips

in the dataset s

than datasets datasets dataset, dataset, Comment citer cedocument: (20.2average species on ematode

10.1099/jgv.0.000742 - other transmitted viruses ink meaning that (13/13) . Concerning vertical transmission, seed. Concerning transmission, vertical virus s - (six

- transmitted viruses (datatransmitted notshown). between borne viruses had a virusesborne broader hostrange

precision orprecision

virus species , group

the links betweenthe links ot shown; ot shown;

or with noknownvectoror with these these

s and nosignificantand wass difference observed viruses with three genome withthreeviruses segments 6 had

(28.3 species(28.3 onaverage) ,

virus traits and HRB examined virus traits was ) seed accuracy issues

abs [17] than viruses withnoverticaltransmission

broad only 16.5 (12.7 HRB - transmitted viruses ). )

Concerning abiotic horizontal .

vs the two host range host Inaddition, viruses transmitted

differences were observeddifferencesamong were to 13.5 . 14.4species,respectively

were maybe

slightly HRB ( Method species s (from 10. (from or

(38.3 species(38.3 on similar similar

of the seedof the embryo estimates Concerning biotic biotic Concerning because oftheirbecause - transmitted viruses transmitted broader hostrange than (33.5 species on

had a

on average S1 to those

9 to15.1 and other groupsother larger HRB

had a (absHRB

Table S1 with ) ). .

larger Both

).

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: virus in dichotomies third VEC variable 480 combination take account into the interactionsbetween explanatory variables significantly computedWe larger HRBthan revealed significant withnon differences inHRB, transmission poor capacity groups. differences true between todetect virus (four virus underrepresentation of viruses with than virus detected. detected with the Kruskal ds ss significantgroupsamonggenome virus differing differences by their nature, of vector showed lessrobust effects. transmission, arthropod than viruses and withnoverticaltransmission nematode Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). RNA genome

the tree genome. - virus dataset

However, the was trend However, and similar - es dataset borne virusesborne were groups) less these in es two or

In

s , a similar and simpler regression simpler , a and similar was (Fig relHRB obtained for tree with , onlywhich comprising ≥15plantspecies viruses datasets for explanatory variablesexplanatory

of variables and separate conditional inference six additionalsix

linked to the linked to having a

, the first dichotomy tree, thefirst intheabsHRB regression for a byviruses transmitted in vectors a were ds

genome.

linked tothe linked Comment citer cedocument:

(Table S (Table

significantly - and multiple test Wallis d 10.1099/jgv.0.000742

nematode datase VER

variable This lack was ofrobustness 1 ) variable ts ( . ds

GEN, SEG,VER GEN, regression treesto Two other explanatory variables, genomeexplanatory typeTwo variables, otherand , SEG and GEN

an overall effect of virus genome naturean onHRB effectofvirus overall RNA or - larger HRB than at least virusgroupslarger HRB oneofwith the

borne viruses from viruses others (Fig.borne level

and consequently

and, s viruses with viruses correspond but no differences groups between virus were 7 ds

again DNA genomesDNA - vectoreda viruses having Only datasetsrevealed 3/13ofthese

non variables

, and to the SEGvariable synthesize - - circulative

borne virusesborne to higher

VEC a ss

a statistical lack power, of probably , respectively. , respectively. RNA genome RNA genome For )

in many of thesein many datasets of and indicate on HRB.

the effect of the effect the kind of vectorthe kind or manner due to

lower HRB.lower

1 had a was ). Regression trees

. a Theand second .

wereassayed

More distalnodes

With With the

had a linked tothe sig

larger HRB than viruses with which the nificantly 1 the

b With the With larger HRB four ) or 293 i.e. main was kind -

a a

a

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: plant taxon widespreadand/or stronger. We Plant determinants of families inMethodS2. arealso presented provided inMethodS2. absHRB randomly class model wasto 0.48compared VER and VEC class 2 transmitted dsRNA viruses, the analyses (Table borne and seed genera S2 Differences in these biological properties virusHRB genomeexplain segmentation to Overall, theregression trees strengthen vectors the natureofvirus and importance ofthe of TheabsHRB treewere notshown). nodes (data ofthe linked tothe Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). ).

small examined at whichplant taxonomic Nepovirus

: ( A model was built to predict theabsA model topredict was built

6 to 15 host species;6 to15host class 3:> Begomovirus er

HRB richness andabundance HRB -

(Method S2). S2). (Method was theonly transmitted, twotraits thatwere linkedinthe tolargeprevioustransmitted, ranges host 1

of alphacryptovirusesof (1. ; Table S1 ; Table

were also comparedwere between ,

C Comment citer cedocument:

virus virus . arlavirus Additional models aimingof host genera theor diversity topredict A d 10.1099/jgv.0.000742 correspond a to ). compared to Four differentFour genus etailed analysis predictive ofthe of the performance modelis a rate with a of random 0.33obtained choosing classifier The rate overall assignmentscorrect of absHRBclasses to bythe host rangehost In the480 ,

Potyvirus

revealing a revealing

were calculated for the hosts of eachwere for calculated thehostsof vir 15 host species)15 host from v the four - rank

virus dataset,the only additionalwas effectdetected

most other most

2 host species onaverage), species 2 host

higher or lower higher Hill Hill HRB class of a virus(classHRB species; of host class 1:≤5

and emphasize and emphasize and

the

8 significantly numbers

Tymovirus barriers by viruses toinfection the most representedthe most

virus generashown).virus not (data [16] [16]

HRB ). which combinations between broade

of order q=3, q=0 to Nepoviruses are nematode .

r VEC and SEG and VEC

host rangehost a group of seedgroup a ariables GEN, SEG, virus generavirus us speciesat and

than other than are

integrat variables

more more ( - Table the -

ing .

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: leading. module inthe matrix significanceassessment algorithm used, the was showntobehighly significantly properties that host (37virus smaller matrix From theentire virusspecies Structur family as appears a key additional barriers occur taxonomic plant at toinfection higher and/or stronger virus speciesgenerally being several in infectious plant genera,moreand frequent much and butsignificantdiver small host no significantof diversity between differencewas theorderfamily observed and the levels sharp decreasegenus observedbetweenand ofhost was thefamily diversity levels numbers and differences thesignificant taxonomic between richnesson host evenness ordistribution different ranks Depending plant taxonomic (Fig. 2). onq,theHill numbers Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

data wasand extracted Concerning modularityanalyses, edge.betweenness the These results eigenvector.community), m e of of the correspond to two totwo correspond barriers toinfectionbarriers at the between plant

way to simulate the simulate model statusoftheway missingand data to thenull

. For the other two indicate at toinfection the moderatewithin barriers - Comment citer cedocument:

virus virus taxonomic taxonomic es ( Table

× 28plant 10.1099/jgv.0.000742 analyzed – infectivity matrixinfectivity

plant species infectivity matrix obtainedplant speciesfrom the infectivitydatabase,a matrix 2 ). contrasted range ofhost models evolution

sity ranks. was reduction observedat higher taxonomic threshold for plantvirus odularity was significantodularity was

s

nested for nested or modular structurafor nested or ) containing relatively) containing (<10%) few missing host/ algorithms . However, whatever rankingHill q,the ofthe

( P 9 - value

-

(spinglass.community and family

<0.001) ( rank

host rangehost levels were A similar. remarkably in most simulation cases simulation in most Fig. . By only comparison, . few algorithm rank 3a s l patterns .

), whatever the Consequently, t .

-

family failed to d failed to placeweight more [18] ,

two important

. rank

This matrix etect any

, used for

he plant each . Then, . Then, non -

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: analyses conditions emergence speciesgiven plantexposure innewa for a plant viruspossessing oftraits if set natural conditions. capacity plants,irrespective aset toinfect the of exposure of only range natural,host This allowed data. identify Discussion plants belongingown module totheir the other viruses hand,2and belonging 3were “specialist” tomodules infect The plants ofany module. with Amaranthaceae Fabaceae and shared by viruses belonging tothe virus speciesin were distributed modularitymaximal ( way ofsimulating missing and model data null ( 0.006< Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). plant

broad range host A major These reflect patterns two P

and interpret

- , botanical Modulea 1contained majority families. of value were to whereas modules 2andexclusively 3comprised almost modules whereas plant

asset <0.0 (7 ofand 8)

chang Consequently 78

s of the VIDE databaseVIDE of the is 0.14 for spinglass.community0.14 algorithm ation of ation

; were identified within module 1andwithin the capacitywere tended identified topossess to Comment citer cedocument: Table e . 10.1099/jgv.0.000742 As Solanaceae

the results

for 2 ) to y , our analyses can beanalyses useful, ourto can . In. one only

groups of

contributed mainly to the nested pattern of the matrix. On contributedof the mainlytothe nestedmatrix. pattern any three modules ( same module

and ensured themodularand pattern ensured matrix of the large

. (8 ofrespectively 9), (

These pitfalls can be related to (i) the exhaustiveness canTheserelated be pitfalls to(i)

plant viruses dataset, there of aredataset, a number 10 the of eight of was was

but

compilation ofexperimental, rather than Fig. ing

the significance a strongassociation between modules

combinations properties linkedtothe 3b .

). There property noobvious was On

of these plants tothevirusin of these plants one hand, viruses “generalist” ) was obtained when) wasobtained plantand

plants Fig. estimate 3b slightly aboveslightly of (7 of 11 (7 of ).

mostly able toinfect modularity algorithm, modularity algorithm,

s the risk the

of the of pitfalls toavoid pitfalls

species v ir s

us of future 0.05

intrinsic intrinsic ( Fig. ) of .

The

the the 3b

for ) .

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: of plants were used tocharacterize virus which the the impact of plant below). taxa families. quite and exhaustive themissing virusspeciesquite evenly are distributed characterize tothe detriment ofhostrange data. virusspecies beshould emph in the dataset species. begomovirus from speciesabsent the d Geminivirid and thepresent taxonomyspeciesgenusis thehuge of inthe increase Endornavirid (29species families acceptedare todate40%of theirexcept byspeciesmembers, represented at least number species (900) virus of plant since. data of the Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). base when

specie database It Concerning The

Twelve includes range host 480 which virusspecies, for represents 53% ofthe data total .

However, care abs few s database is based on the literaturedatabase1996 and isbased has on been not availablein updated ae),majority thewe data for If fewHRB ofwhich are withdraw167 available. the , of a total of 85genera, ofa total of (0 ofTheae major 4). quantitative difference which HRB this sourceofimprecision,this

HRB data data HRB and to (ii) theprecisionandaccuracy obtained to(ii) and from HRBestimates (iii) the of asized that virologists use more and more genome moreasized thatvirologistsand usemore data sequence to

virus generavirus

and relHRB the precisionof HRB estimates varies Comment citer cedocument:

should betakeninterpretation aboutshould results the ofour are

between viruses of 196),6) of8), Bunyaviridae Ophioviridae(0 and(2 10.1099/jgv.0.000742

available due to the peculiaritiesavailable duebiology tothe of thevirus (see which were well fairly correlated,representative indicating sets that

accepted include few species (2.3 on average) fewspecies(2.3are notrepresented on include .

Concerning the plant species used to estimate HRB, Concerning plantspeciesestimate it the used to atabase, our datasetof the atabase, total virus our 66% represents

HRB we defined awe defined

and can be for small and some can ofthem in the in the

11 ( Method

,

latest a limiting reduced

S1

ICTV report

) between the viruses inthedatabase theviruses between .

Then,

In database summary, is the

factor is set of we analyzedwe separately 29 Begomovirus [19 the number of the number 3

plant viruses forplant - across 20] . . for some virus

To minimize To minimize The 19

genera and

(family assayed

virus the

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: virus inf andgeneral seems be to were significantlyHRBnotshown). tovirus still (data linked nepoviruses from the 480 member species) diversity. extremeHRBparticular rather, values with but, the virustaxa were shared throughout robustness ofour results. number ofvirus species VEC subsets, r increasing diversity genome or transmission accuracy of they significantHRB the three for S1 abs Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). ) HR . Impor would have been would have

variables.For thealso variable, results similar GEN were B A Importantly, observednot dueto relationships traits betweenandwere viral HRB the The ection within at the plant in

esults and distribution and distribution strong Nepovirus

tantly, thresholds for thresholds for

the 480 unequal HRB were link betweenlink that exhibited a significantlythat exhibited most estimation -

distribution of the distribution and 293 and was the only virus was similar to those obtained with the tothoseobtained with similar

affected by

relationships between HRBrelationships and virus biological properties Comment citer cedocument:

in the differentgenome groupsof nature properties for plant parasitesfor

Hill - of

or 293 10.1099/jgv.0.000742 . assayed - gen virus data

Consequently estimates plant taxonomyplant .

and - - genus butfrequent between rank infection barriers virus datasets,thefour imprecisions using

plant species plant Hill assayed . The robustness ofthesetherisk thatrobustness results. Theminimizes sets genusfairly representeddatabasewell inthe

fam datasets with increasing termsdatasets in of requirements

[21] , . and , respectively) we 12 and larger

. Overall, there plant species in HRB estimation the relthe analyz

amonggenera and plant families virus host range virus host

HRB (Table removingHRB S2).After HR ed entire

variables GEN, SEG,VER andVECvariables GEN, B the links between virus HRB betweenandthe links

(Table S1)

in across

dataset

the 29 were when there .

Again, this stre Again, this .

was evidenced in ourstudyevidencedwas plant tax

relatively 3 for the SEG, VERandfor the .

- For all these virus Forall virus datasetvirus was a may affectmay few

a sufficient ngthens the

were highly barriers to

( (Table i.e (≥12 plant . with

the

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: hypotheses theincrease canto explain raised be of broader acids than those witha that viruses is explanation witha ss bet genome range had broader host determinants ofvirus 1 and nature and and viruses of the which However jumps. host no virus taxon Each analysis the of was hostrange.threshold virus thatoftenconfirmed,ona limits smaller This families (Fig. Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). RNA viruses pathogensOne are animal possible toemerging the prone most viahostjumps. t ween the

/or to

detected [24]

Table S1 three(Fig. modules fungal taxa was to“g mostly due selectiveforces have The segmentationgenomes ofwas virus also linked to strongly identifiedWe suggests thatsome viral plant plant . However, why thisputative explain mechanism not virusesa does have ssDNA segmentation, occurrence andvertical vector modeofsegmentation, and transmission type . se In tree ho 2

module two groups two ) plant ). . st range dsRNA viruses,they since than

At this stage,canAt this virus we traits speculateas only four are these towhether The ds [22] - fungus interactions,fungus

- genome genome virus

, the plant , the several HRB plant plant . was Consequently, planttaxonomy Comment citer cedocument:

(Table 3 infectivity

10.1099/jgv.0.000742

stronglyassociated with familyas can be rank a considered criticaltaxonomic therefore or if they areor ifwith they linked ) eneralist” viruses that wereeneralist”able viruses toinfectplants belonging that determined rangecontrasted host breadth among plantviruses . [23]

virus traits virus The generalist apparent and between dichotomy specialist - s virus matrix presentedvirus matrix alsonested pattern asignificantly traits

than viruses witha than viruses , perhaps a consequence higherof instability ofthe ss 1 )

.

genome tendtohave evolution rate higher mutationand

In

matrix (Fig.matrix (including potentiallythoserevealedinourstudy)

accordance the detected robustly 13

and 3 , HRB ).

a

Woolhouse Woolhouse ds modules HRB

The matrix wasThe significantly matrix modular. strongly linked totheir plant may genome

for viruses with two, andfor withtwo, especially viruses share

for other reasons.for other

famil be used as a first predictorfirst asofvirusbe a used werealso linked

similar evolution rates evolution similar ,

et al. with a3 y

but not with but not

[1] [1] HRB -

observ to 4 to . Several . Several Viruses witha HRB data - fold difference

to a particular ed

host taxa but taxa host set, the by :

that genome ss [23] to

nucleic

(Table (Table , any . .

ss s

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: specialization. intuitive narrower hostranges high which theadvantagesgenome ofsegmentatio segments increases. MOIminimum forrequired infectiongenome rapidly very wouldincrease as of thenumber over time and O effective environment between can for advantageous virusisolates,adaptation which be expression replication rapidity was proposed that segmentation Hypotheses advantage regarding the of withdrawingviruses thesefrom threethe datasets( HRB Kruskal Cucumber mosaicvirus virus genomewith three (Table segments Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). nly number of a small

hence themultiplicity is of infection(MOI) er es values.

MOIs MOIs

The with three genomewith three segments that - , Wallis test Wallis

population size ofplantpopulation size viruses , since

for plantviruses [29] [29] larger hostrange

( Moreover, the Moreover, the [33] th

For offamily example, dsRNA plantviruses genera and of seed transmission throughseed transmission theembryo assoc isoften e so and, . I .

nterestingly, the observation that viruses with >3 genome viruses segmentsnterestingly, with>3 theobservationthat [26 - used T

called “advantage ofsex”; , on average,, on o his suggests an optimum number of genomehis suggests of number an segments optimum n a longer term - , Comment citer cedocument: 28] virus

Tomato spotted spotted Tomato is based onranks and 10.1099/jgv.0.000742 . Genome segmentation [31 SEG variable of of

particlesto the infection contribute of an individual - 32] seed increase

lends .

1 A consequence - transmitted viruses , ). could favor

virus

It is support

s wilt virus wilt possess may be the main limitation ofgenomemay mainlimitation be the

remained significantly linked to genome stability,genome replication fidelityand/or noteworthy g n are notcounterbalanced necessityn are by the of 14 enome segmentationenome therefore notartificially therefore

[30] now considerednow tobe to to

extremely ranges, large host

genome exchanges through reassortment P this hypothesisthis and could also

). of virus genome that is of virus segmentation - value On the opposite Alfalfa mosaic virusAlfalfa

that this was due this not toa that (Table 1; s = 1.2×10 allow betterregulationgene of (Table [17] iated with host with host iated are longstanding - 3

, to 3.7×10 low, though low, ) the lowwithin in achanging influenced seems counter 1 HRB ). . Indeed,. the

(3 or 4) especially -

segmentation 3 after ). number of plant

variable by extreme by , for

[25] - host host -

have cell

. the It

.

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: that of extremely narrow hostrange. vectors large rangesexcl exhibit thatare host and those transmitted viruses be should distinguished subset theirIndeed, species. host many are transmitted only through plantviruses inasmall seeds with abroader hostrange have of being more opportunities seed An alternative withac hypothesis, with novector or transmission withexclusive viruses transmitted boththroughhave hostrange and thanviruses seeds a by vectors broader ( Thisvirus dissemination. evolutionary only interest would be withmixed trend of for viruses negativecompensated hostrange, effectcouldbe by opportunitie aoffering more broader decreaseof infecting theprobability through new plants horizontal t transmitted viruses could belinkedtotheprobability of transmitted viruses. Endornaviridae,which may ledto have data spread,of horizontal mechanical transmissi pollen tothe seed hostrangeto and their isrestricted Alphacryptovirus rade i.e. Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

set both vertical and horizontal) transmission modes. transmission andboth vertical horizontal) - off off

o

viruses transmitted byviruses transmitted Finally, contained only 22 memberscontained of 22 only f their hostspecies. has

tr ansmission inansmission

been observed between theefficiency observed been of t he host range of nematode range host he of

and

An ultimate cause hostrange ofthe of broader Betacryptovirus Comment citer cedocument: some 10.1099/jgv.0.000742

Overall,that twodifferent wegroups postulateofseed

arthropod

plant viruses ausal link in the opposite direction, be opposite thatvirusesausal could the in link

the ( s, fungi family a generala family Partitiviridae and no member of andfamily nomember Partitiviridae - : T on and even graftevenon and [19] transmission transmitted viruseslarge wastransmitted as about as twice [34] hose thatarealsohose transmittedhorizontally by 15 vector transmission or with or

overestimation oftheoverestimation average . Asa couldtransmission consequence, seed ca

vertical Confirming this hypothesis, plant Confirming hypothesis, this out usively vertical transmitted . one infecting plants.A healthy new, horizontal

vector )

plant species the because lack of are byand transmitted by ovule seed seed - s transmitted inatleast oneoftransmitted

( (Table

data not shown anddata not some groups of some transmission. This transmission. transmission transmission 1 ;

[17] . The 4 HRB ) and .

the Again, an ly ly - seed

show of seedof family 80 [17] - - virus s

an for ).

-

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: species susceptibility conditions can abolish the of expression theresistance inlocal in the virusavirulence cangene between virus( andthe theplant l areto someextent, considered as inthedatabase. hosts inwhich theviruslesion hosts, multipl experiments, whereas incl hosts virus species. Exchange) database range virushost Plant data Plant virus database M prone higher capacity dissemination transmitted plants. viruses couldevolved have broader hostrange ( probability plants. poor ofinfecting Because new ofthe forexplanation the broader hostrangeof nematode ocal result lesions only Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). ethods

a to

T few were he same tendency observedsoil for not was other becoming after extinct the

by fungi centimetres

considered as host considered [41] N on . - hosts hosts from Plant

[15] (Table (Table

Comment citer cedocument:

per year in per

hypersensitive reactions triggeredgene by , andhost which non includes aof list have been species included both in the host andspecies host non included bothinthe 10.1099/jgv.0.000742 were 1 )

, than nematodes and/or that nematodeand/or nematodesthan which may be due which may be due s

ude bothnatural

allow full systemic infection [38 obtained from the VIDE (Virus Identification VIDE the Dataobtained (Virus from . Indeed,. differences these werechoice dueand tothe ofplant e.g.

uncultivated woodland h uncultivated woodland death ofthe death determined by

[36,37] ies

and move ) ,

16 (ii) s

i to increaseinfecting their chance of new ly r host

mutations in theplant in mutations resistancegene and/or

to and experimental controlled laboratory inoculation - transmitted belinkedtothe viruses could s

the fact that

from inthe inoculated cell organs tocell .

This is justified by justified the factsThis is that(i) migration - borne v abitats; - - lesion plants andlead tofull host plantspecieshost each for - fungal vectors fungal 40]

- iruses, iruses, capacity - host lists forgiven lists ahost virus for - ly [35] borne viruses are borne more

and (iii) - infected - gene interactionsgene ) , nematode such as

of their vectors

environmental

species. Localspecies.

may

those - borne borne have ahave

- Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: assayed, may artificially this theHRBestimate increase because thereare relativelyfewer of plant genus specieswhere same as initially oftheonevirus was isolatedare the greatly betweenForexamp which virusspecies, can beofbias. a source S1 a number of minimal satisfactoryrelHRB, for dataset precision restrictedthe the we to transmission). horizontal implement underestimatesof viruses forexperimentaldifficult inoculation is theHRB which to virus species,affects which and accuracy issues estimatethe absHRB ofvirus HRB is A totalof 480viral containedobvious species inthedatabase.most range host data The Estimating range host Biological viruses were ofobtained properties from Viruses) report definitive ortentative speciesinthe la using theR software for affectingand/or resistanceviruspathogenicity. host Theand ofhost non list virus genotypes indifferentandthe result studies probablyvariability ofintraspecific Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). ). Second, of assayedwithin plant thedistribution taxonomy plantspecies also varies each virus species was copiedeach April was from2009and virus species in toExcel thenformatted thedatabase Amongdescribed thevirus species inthe database, ( no artificial inoculation method, vector unknown, vector difficult toraise, unknown,no artificialvector method,no vector inoculation difficult [19] . . Virus taxonomy orhighegenus at the First, thetotalnumberFirst, ofassayedgreatly species between varies plant version 3.0.2 version n =15 plantspecies=15 ( were assayed Comment citer cedocument: breadth of plant viruses plant tackling and of potential biases breadth

To tackle this 10.1099/jgv.0.000742

the precisionestimate. Forexample,absHRB of HRB usually [42]

. However, t . However, test ICTVtest (International onTaxonomy Committee of .

issue , we analyzedensure therelHRB. To a 17

his estimate canaffected by be Brunt see Method

et al. r ranks was as proposed by ICTV.r ranks was by as proposed we kept [15] [15] 293

only thoseconsideredonly as S1 forS1 justification and King virus species for wh forvirus species le, if a largenumber - host plantspecieshost et al.

precision precision [19] ; Table .

ich

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: Hill different of thresholds minimal correlatedgenus between and thefamily levels notshown). Consequently, chose (data we of order highly toq=3were q=0 correlated(r>0.90) at the from q and relativeabundances discounts rare yields diversity Simpson The variable limit asqtendsits 1is to the exponential equation (1), frequencies. abundance in which a class of diversity measures [16] was as assessed by whichdataset agenerafor virusspecies diversity to minimal plant families of or infection barrier 푞 1 Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). 퐷 퐷

fam , ofincreasingecology use in = = calculated ) ( exp

= to ∑

S 0 toq analyze

( 푖 푆 is the number of is thenumber p = − i For 1 . The parameter q determines thesensitivity ofthe 1 so that D ∑

taxa 푝 Hill at different plant taxonomic levelsat plant taxonomic different (genus, family…) with = 푖 푆

푖 s at this plant taxonomic rankplant taxonomic s at this

푞 weighs inproportion totheir species = q

) 3 1

1 . = the robustness ofourthe robustness results ’s ). In theanalysis numbers). the species,Hill ofassayed found we that plant 푝

/ Usually, a 푖 ( 0 0, the diversity 1 D

− ×

푞 Comment citer cedocument: corresponds to

( , ) log p

which placesonthe frequencies weight more of (1),

1 abundances of individual abundances ofindividual 10.1099/jgv.0.000742 , p

taxa 푝

(Hill numbers)

푖 . 2

characterization of the characterization of ,….. )

For each virus species, eachFor virusspecies, 2

D in the (2). [43]

calculated , p

S .

) isconveyed by plot a diversity(a profile taxon sample Hill Hill

of the Shannon index [16] [16] defined

richness. For q . Tot at the genus and at the family levels genus family at and at the the 18 ,

(Table S1) (Table and thei and

integrated ackle biassource, this we restricted the

taxa for q

frequency. taxon the diversity assayed plantspecies of th

do notcontributeto . taxon ≠ genusat the family or level, less but

taxon taxon

diversity of = 1 as 1, :

measure tothe relativemeasure equation

: ( When q When

i richness abundance and into =

1, 2,

a

….. = (1) (1) the the sample

abundant abundant

2, , is

Hill numbersHill equation (1) the sum in the sumin S

) undefined, but

has relative

was assayed of with ( Hill q taxa D S

gen vs

taxa

and . q .

and

, Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: (variable‘VEC’); - simplicity), novecto and(viii) nematode, (ix) (v)fungus soil thrips,(vi)mite, (vii) or - betweentransmission t and(v) no plants abiotic horizontal contact butno tooltransmission (iii) transmission, contact contact (iv) transmission, or contact (ii)substrate ( transmission, (whenplant transmission host some - ( - ‘SEG’); same category becausewere inthe underrepresented database they(20 species) (variable - - life historyproperties orgenome viruses: traits ofplant traitstransmission (putative explanatorycategorical variables).These represent traits major and relHRB weregroups compared among To unravelinHRB, explaining differences theresponse variablesabsHRB properties virus between Relationships range host properties breadth and biological viruses of plant ii the vector definedascategory grouped previouslyarthropods type a butwith into single asthe vector (i)aphid, categorized whitefly, (ii) type, (iii) categorizedthe occurrence ofabiotic transmission, and horizontal (i) mode graft as the occurrence ofvertical catego and transmission, mode segments, genomethe numbergenomegrouped of segments with>3 the being into viruses genomeThecategorized as nature, Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). ) novertical (variable‘VER’); transmission

Comment citer cedocument: 10.1099/jgv.0.000742 s

ss are by usuallymultiplied no substra grafting) but i.e.

- or borne protists (namedborne collectively protists fungus for

ds soil or irrigation water) transmission but no tool or orirrigation butnotool soil water) transmission of viruses sharing thesame or biological of viruses

RNA orDNARNA (var r

19

known

;

rized as (i) seed transmission and seed rized as (i) transmission ransmission known; other iable ' iable Hemiptera, (iv) Coleoptera, Hemiptera, GEN ’);

te, tool

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: between Relationships four parameters number ofminimum 15viruses ineach the “leaf” treeand terminal of c better explore theeffectof significantexplanatory most used We variablesonHRB. the to compare betweenAdditionally,regression theHRB viruscategories.trees were to realized assess thesignifican wedistribution, used non or relHRB eachexplanatory was per variable betweenexplanatoryvariables. the ofthean between Then,HRB analysis links estimates and MCA All statistical analyseswere s withthe R performed presented. are variables(genomecircularity) confounded therefore with other orand linearity are not resultsdatabase. provide(number vector didnot consistent of species) Additional or variables constraints were notincluded becauseof datavirusesa ofthefor in largeset lack of orwithin Variables composition linked tonucleotide multiplicative. or semi -

onditional inference method onditional trees the mode of vector transmission, categorizedthe mode ofvector transmission, (i) non as Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). categorical GEN, SEG,VER andVEC. variables s

using thepackage ‘FactoMineR’ toanalyze thecorrelations redundancy and putative - persistent), (iii) circulative (

with any of the explanatorywith any ofthe anormalfrom variablesdeparted significantly to describe theconditional distribution ofabsHRBasand relHRB

ce of thelinkbetween of ce estimate HRB Comment citer cedocument: virus virus - parametric Kruskal 10.1099/jgv.0.000742 host range planthost taxonomy and formed. of residues linear explaining Since models ‘

ctree i.e . persistent but non but . persistent ’

implemented inthepackage - 20 Wallis tests, using tests, Wallis thepackage ‘pgirmess’, to

oftware we 3.0.2.First, performed version - species geneticspecies diversity evolutionary or

- s and eachexplanatory variable vectored, (ii)non - multiplicative) and (iv) circulative

default setting default ‘ party - circulative(non ’

a function of the with a

the for other abs

HRB and -

- Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: is unknown) these accept methodsdonot missing data [18] new host correspond pattern, correspond the hostrange ofless [18;47 nestedness, a vary range onhost evolution depending and 0for non specieshost genotypes or Infectivity matrices Structure theplant of genusspecies,and taxonomyPlant was among Kruskal with ranks planttaxonomic num calculated species proximity.virus species, andForeach their taxonomic Different diversity Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

bers and wereusing computed the‘bipartite’ and ‘igra - Methods toe 49] where s of host plants

at the costinfectivityhost oflosing onolder . as presentedbefore s s

In pa a nested

, the first step , thefirst step to a host shift pattern ofevolutionto ahostshift In pattern evolution.expansion ofamodular to ahostrange virus contrast, - hosts each virus isspecialized toinfecteach virus one(or ahostspecies set small of) indices werecalculated between tounravelthestatusofplant links host indices the familyranks ). Structural patterns of such matrices, such). Structural patterns notably their of modularity and ,

stimate nestedness a where a speciesorgenotypes settoa ofparasite set areof confronted according to - specialized viruses, leadingspecialized toastair

- for all for Comment citer cedocument: virus infectivity matrix infectivityvirus , ttern, the more range host ofthe

was to contain binary data 10.1099/jgv.0.000742 virus species with

and

extract asub Brunt the

according to Stevens according to Hill numbersHill et al. nd modularity are described indetail in nd modularity are

(plant - that we that we Wallis tests. Wallis

[15] [15] 21 - related toparasites’ hostrange . matrix matrix - Inthat case,

virus combinations forthehoststatus which and obtained of orders q=0 toq=3. t the species or ph’

from the database s Watson &Watson Dallwitz .

- [46] packages, respectively.packages,Because - virus specialized shaped pattern for host cases.for patternshaped This

virus for

at higher ranks the q es = host diversityhost was

0 toq become specialized on become intraspecific with agoodwith = [44,45] [44,45] The mean Hill es

3

( . is asubset of were e.g. Weitz Weitz

levels

1 for hosts 1 for at the ,

compared

et al.

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: informationFunding generatedmatrices models. bothnull under Eachthe same lineofmatrix. ofthe thefilled 100filledmatricescompared to1000 was casewas themean column to whichfrequenciesand equal ofhostcases ofthein inthesame where number oflines and wasof host cases randomlyfilled matrix inmatricesdistributed as inthe containing thesame different models null modularityfill matrices ofthe infectivity 100 filledmatricessimulated. For werestatistical thenestedness significanceassessment, and (probabilisticcolumn of and matrix intheinitial same degree line In bothcases, model). of being wasofthea case host which frequencies equalh themean to of model) and (ii)each plant to the approaches (i)the host/non were followed: (data notshown). toanalyzeAttempts species only which subset containing was 9.8%data ofmissing subsequentlyanalyzed. species by ofmissing increasingevidencinga numbers × data, 37virusspecies 28plant For this,the lines and colu compromise number betweenspecies ofplant the amount ofmissing andvirus thedata. Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). total amounts ofhosttotal amounts andnon each plant -

virus combination was assignedvirus combination a probability corresponding toahost of Then, we Then,

columns asthefilled matrix columns larger matrices were notsuccessful because oftoomany missing data

[18]

Comment citer cedocument: : (i)the Bernoulli model, null random - mns of themns of matrix were initial and torankvirus permuted plant virus combination missing wasa probability assigned inthe matrix 10.1099/jgv.0.000742

simulate -

d host cases in the rest of the infectivity matrix (Bernoulli therest caseshost matrix oftheinfectivity in

the ed as described previouslycompared totwo were host/non -

host status of missing data was set statusofmissingasproportionalhost data was 22

and - host status host

(ii) the probabilisticdegree model, null for where

missing the same totalnumberthe ost cases inthesame data . Two Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: Not applicable statementEthical The interest of Conflict comments themanuscript. on KarineBerthier, FargetteLecoq and for Hervé thisstudy thespark of and toThierryCandress Thanks Véronique to Decognetefficientdatab forhelpinthe management of the Acknowledgments play programme of This workInfectieuses wasby supported (Maladies interdisciplinary theMIE Emergentes) Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

authors ofinterest. noconflict declare any role in the study orinthepreparationany articledecision role of inthe study the or topublish.

CNRS (CentreNationallaRechercheCNRS de Scientifique). Stéphane BlancStéphane

Comment citer cedocument: 10.1099/jgv.0.000742

and Julien Papaïx Papaïx and Julien

23

for help analyses in constructand/or

The funders funders The e, Mark Tepfer, ase

did not did not ,

to Denis to Denis ive Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: Trends Ecol Evol pathogen plants: of diseases infectious 13. pathogens 12. mastrevirusesplant hosts withtheir 11. 602. 10. evolutionary of inaplant a jump reconstruction virus host 9. of adaptability 2012 the modulate contingencies 8. range 2011 host its extend to genes nonconserved multiple acquiring 7. Microbe Interact of 6. J Gen Virol ability of 5. of plant viruses 4. 2008; 3. ecology and evolution 2. evolution ofspecies jumps 1. References Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

aaa igpt virus ringspot Papaya

; ; 8 108 5 : : Anderson Anderson PK MEJ Woolhouse B Wu A Gibbs N Vassilakos N Poulicard CJ Robertson S, Tatineni Chen KC N Suehiro J Dijskra Pulliam SJ Scharg DT, Haydon MEJ, Woolhouse e1002482. 80 Turnip Turnip mosaic virus : . 17366 - 91. Emerg Infect Dis Emerg Infect

2004 ,

. Melcher U Melcher . – JRC Arch Virol

, ;

. 2008 Evolution and origins of tobamoviruses of origins and Evolution 17371. 85: 2004

Chiang Chiang CH , , Importance of host ranges and other biological properties for the taxonomy the for properties biological other and ranges host of Importance

Natsuaki inr P Wiener , 2087 , . ,

Pinel . Simon V Simon ; ; framework predictive a toward Moving jumps: host Viral Cunningham AA 21 Trends Ecol Evol 19 Comment citer cedocument:

, :

: - 1046

. 535 10.1099/jgv.0.000742 2098. , Gowtage

- eemns ot pcfct fr neto o papaya of infection for specificity host determines 1992 Trends Ecol Evol az A Galzi

Guo X Guo 2005

to infect , – T .

– Raja JAJ 544. , , of roles relative the are what disease: infectious Emerging 1057. ; Tzima A Tzima Suppl5 aaae T Watanabe ; 11 . ,

,

- BMC Evol Biol , ane SM Garnsey : eura S Sequeria Wang X Wang 1842 Brassica Traoré O Traoré

pollution, climate change and agrotechnology drivers agrotechnology and change climate pollution, Antia :

, , 279 1995 Patel NG Liu Liu FL , - 1847. Johansen E Johansen -

2005; 289. 24 ;

,

10 R spp ,

Fan L L Fan

, . ie elw ote virus mottle yellow Rice kd S Okuda .

:

, 319 inl F Vignols

E ot ag ad mrig n reemerging and emerging and range Host

, Tai Tai CH and/or 20

merging pathogens: the epidemiology and and epidemiology the pathogens: merging , 2008 asn WO Dawson Morales Morales FJ - : 238 324. t al. et ,

; . Moury B Moury . 8 Raphanus sativus -

Phil Trans R Soc B Soc R Trans Phil .

Mol et al. 244. :

335. n motn dtriat f the of determinant important An ,

Assessment of codivergence of codivergence of Assessment Ghesquière A Ghesquière

Biol EvolBiol

13. A single amino acid of NiaPro , . Epstein Epstein PR .

. pat iu eovd by evolved virus plant A rc al cd c USA Sci Acad Natl Proc

Genetic determinism and determinism Genetic

2016

is in its P3 protein .

t al. et

; et al. 1999 LS Pathog PLOS 33 . : . 541 o Plant Mol

EcoHealth

; Emerging Historical 354 - 553. : 593

– - . .

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: Theor Biol 27. 1987 26. viruses 25. 1990 energy activation the and constants rate of determination deamination: cytosine 24. ratesmutation across viruses 23. network: The influence asymmetric past evolutionary of history 22. Appl risk analysis: phylogenetic signal as a predictor of host range of plant pests and pathogens 21. evolve differently others? from the 20. 2012 Viruses viruses of nomenclature and 19. networks 18. and Triage SA Levin P, Kareiva In: 17. 1973 16. http://pvo.bio Database VIDE the from Lists and Descriptions Online: 15. ResVirus for prospects and conditions, world changing of effects emergence, driving factors 14. Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

; ; . ; 2012

25 29 54 . e S Nee A Sicard LA Frederico R Sanjuán Vacher C GS Gilbert C Desbiez King AMQ Weitz JS AG Power M Hill L Watson AJ, Gibbs MJ, Dallwitz K, Crabtree AA, Brunt RAC Jones Chao L Chao . : : : PLoS Pathog PLoS 277 2532 USA CA, Diego, San USA, MA, Waltham, UK, London, 427 .

2009 ; Trends Microbiol

5 . 1991 : Princeton, - – 869 281. - – 432. . mirror.cn/refs.htm .

2537. ;

. 141

; iest ad vnes a nfig oain n is consequences its and notation unifying a evenness: and Diversity h eouin f utcmatetl eoe i viruses in genomes multicompartmental of evolution The - Levels , , 153 , 878.

Poisot T ,

Michalakis Y Michalakis Piou D , . , , . : Moury B Moury

Flecker AS Flecker

113 Adams MJ Magarey

rm oeua gntc t pyoyais Evolutionary phylodynamics: to genetics molecular From : Plant virus emergence and evolution: origins, new encounter scenarios, encounter new origins, evolution: and emergence virus Plant 229

, 2016 USA: Princeton University Press; Kunkel TA Kunkel Comment citer cedocument: of selection, evolution of sex in RNA viruses, and the origin of life of origin the and viruses, RNA in sex of evolution selection, of –

– 130. (editors) , 246.

, 10.1099/jgv.0.000742 Desprez ; 12 Meyer JR 2013 . . ,

PLOS PathogPLOS

:

Ninth report of the International Committee on Taxonomy of of Taxonomy on Committee International the of report Ninth e1005819. Lecoq H Lecoq R . , ; , ,

Carstens EB ; Virus specificity in disease disease in specificity Virus now availableat: ht . 21 Gutiérrez S Gutiérrez Suiter K Suiter Inf Genet Evol The importance of species: Perspectives on Expendability on Perspectives species: of importance The , -

Loustau : Shaw BR Shaw 82 , Flores CO - . 91.

The hallmarks of "green" viruses: Do plant viruses plant Do viruses: "green" of hallmarks The ,

2012 Webb CO Webb

M 25 , , .

Lefkowitz EJ -

Blanc S Blanc A sensitive genetic assay for the detection of of detection the for assay genetic sensitive A L ;

8 2011 , . : Valverde S Architecture of an antagonistic tree/fungus e1002685. tp://sdb.im.ac.cn/vide/refs.htm; ; . 11

. 2003.

Evolutionary tools for phytosanitary for tools Evolutionary . The strange lifestyle of multipartite of lifestyle strange The : Version: 20th August 1996 August 20th Version: 812 systems: are species red species are systems: .

- Virus Virus taxonomy

pp . 824. et al. PLOS One PLOS . : 330

Elsevier Academic Press Academic Elsevier

Phage - 347. t al. et

2008 bacteria bacteria infection . .

. Classification ln Viruses Plant Biochemistry ; Ml Evol Mol J eeac of relevance 3 : e1740. .

1996

Ecology undant? undant? control ; . URL: URL: Evol Evol .

. J ; .

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: Pathol host a is protein 40. 682. to resistance r hypersensitive compromise suppress differentially RCY1 thaliana Arabidopsis in 39. Chenopodium quinoa seed pea 38. different species pepper between interaction hierarchical the 37. the of tobamoviruses elicitation the for required 36. 1994 diversicaudatum Xiphinema 35. under and bothhorizontal vertical 34. virusesmultipartite 3 of plant infection thecellular virus at level. 32. progression and host 31. 1978. 30. number virus is differentiallyregulated ina multipartite 29. GenetPLOS genome segmentation can result from a trade 28. 3. Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

;

124 2014 Desbiez C Sekine KT K Andersen R Tomita De A laCruz CE Taylor Stewart AD J Iranzo Tromas N S Gutiérrez Maynard Smith A Sicard S Ojosnegros - : borne mosaic potyvirus modulates the ability of the virus to move systemically in in systemically move to virus the of ability the modulates potyvirus mosaic borne 469 ;

15 2011 - . 477. : Mol Plant 217 - , , specific determinant of systemic infectivity for two potyviruses two for infectivity systemic of determinant specific

, ,

, ; Yvon M Yvon , 7 Ken

Chandeysson Zwart MP , . arba SC Manrubia Ishihara Ishihara T , , - : , Proc RSoc B - 221. e1001344.

Logsdon Logsdon JM Michalakis Y Michalakis rw DJF Brown , to , . IE Johansen Lopez L Lopez Virology García - - Comment citer cedocument: . Taro S Taro host transmission host J Mol Plant

- .

Microbe Interact 10.1099/jgv.0.000742 The Evolution of Sex , Timchenko T Timchenko

- , Arriaza J Arriaza ,

n utvtd n ucliae biotopes uncultivated and cultivated in ,

, Lafforgue G 1998 Hase S Tenllado F Tenllado Hiroyuki M Hiroyuki

C , , transmission 2012

. - Kelley SE , . ,

Microbe Interact

A single conserved amino acid in the coat protein gene of gene protein coat the in acid amino conserved single A elo R Neilson

Blanc Lecoq H ; 241 in segmentation genome of dynamics Evolutionary Tobamovirus Capsicum , ; esponse 279 Kusano T , : . 304 Escarmís C Escarmís

Curr OpinCurr Virol

, S

PLOS GenetPLOS : , 1997 - 3812 Gronenborn B Gronenborn , . off off between genetic

Diaz Ruiz JR

, . 26 Elena Elena SF - . Virus population bottlenecks during within during bottlenecks population Virus

An empirical study of the evolution of virulence 311. Sakamoto M Sakamoto . , A short motif inthe N A short motif . -

Evolution

like cell death cell like Cambridge, UK: AT Jones ; - 10 3919. L ,

Shah J 2 : spp

107 2011 gene , . .

Manrubia SC Manrubia

Within 2014 - Nature Comm Nature and 113.

; 2005

24

- , 2012 et et al. . eitd eitne gis the against resistance mediated ,

, Sanz AI Sanz : ; The persistence and spread of of spread and persistence The Michalakis Y Michalakis Murai

L 108 10 . uubr oac virus mosaic Cucumber ; -

59 Plant Mol Biol Mol Plant host spatiotemporal dynamics ;

resistance gene alleles from from alleles gene resistance : 2 Single Single amino acid alterations content and particle stability Cambridge e1004186. - : : 117. 546 730

- J et al. terminal part ofthe coatterminal part

, - –

et al. et 2013 555. Perales C Perales 739.

. The coat protein is proteinis coat The

;

University Press Genetic basis for basis Genetic 4 et al. et n Ap Biol Appl Ann : 2248.

2006

.

Gene copy copy Gene et al. et Mol Plant Plant Mol

; 62

but , : Viral - 669 host - ; .

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: that genotypes 2014 host in virus plant 49. 2014 between 48. cross 47. http://www.mobot.org/MOBOT/research/APweb/ 46. identification and information 45. interactive and information identification retrieval 44. studies sampling species estimationin numbers:diversity and for Hill aframework extrapolation with 43. for Computing, Statistical Vienna, Austria 42. in growing conditions on the translocation routes and systemic infection of carnation mottle vir 41. Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). Chenopodium quinoa

; ; - 68 88 infection within marine within bacteriainfection and phages . Hillung Hillung J B Moury CO Flores PF Stevens L Watson L Watson A Chao R Core Team Garc : : Ecological Monographs 2467 9799 Potato virus Y virus Potato í a - - - 2480. 9807. Castillo S Castillo , , , , , oel NJ Gotelli Cuevas JM Janzac

, Dallwitz MJ Dallwitz Dallwitz MJ Dallwitz

avre S Valverde . .

R: Comment citer cedocument:

plants nisem hlgn Website Phylogeny Angiosperm and eIF4E and

A language and environment for statistical computing , B 10.1099/jgv.0.000742 Marcos JF Marcos , Ruellan Y Ruellan ,

, retrieval . Valverde S se T.C Hsieh Physiol Mol Pathol Plant

, . . 2014

The families of flowering plants: Descriptions, illustrations, illustrations, Descriptions, plants: flowering of families The Wei The families of of families The -

mediated recessive resistance in the in resistance recessive mediated ; tz 84 ; ifr n hi ssetblt t infection to susceptibility their in differ , URL , Pallas V Pallas : JS Simon V Simon 45 ; , , http://www.R

. – Elena Elena

adr E.L Sander Multi 67. ftp://www.keil.ukans.edu/pub/delta/ 27

. , ; 2012. ; .

ISME J with descriptions, automated angiosperms: , - SF Sanchez Austral SystBot Austral cl srcue n gorpi dies of drivers geographic and structure scale Ben Khalifa M Khalifa Ben .

Experimental Experimental evolution of an emerging

-

2001 project.org/

, 2013 - a KH Ma Pina

; 58 ; 7 eso 1, uy 2012 July 12, Version

: : MA 520 229

1991;

et al. et

; 2013. – t al. et . -

238. 532. Influence of the plant the of Influence 4 Solanaceae

: Interaction patterns Interaction 601

Rarefaction and and Rarefaction - . 695 R R Foundation ; 1992. .

Evolution .

J Virol J

us ;

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: properties Table transmission vector Kind of [VEC] grouped) (arthropods Vector type Vector type transmission horizontal Abiotic [VER] transmission Vertical [SEG] segments Genome [GEN] Genome type [Variable] Virus trait Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). 1

.

M .

ean

host range breadth range host Significance 1.2×10 0.005** 8.0 9.3×10 <10 0.87 1.3×10 ×10 - ns 15

*** Comment citer cedocument: - - - - 4 4 5 5

*** *** *** *** 10.1099/jgv.0.000742

*

whitefly other fungus no vector aphid Coleoptera nematode thrips tools none graft plants contact between substrate none seed transmission >3 segments 1 segment 2 segments 3 segments dsRNA dsDNA ssDNA ssRNA Virus group non circulative transmission no vector multiplying circulative and non fungus arthropod no vector nematode

of

groups of of groups

Hemiptera

28

plant plant

- virus 22.0 ab 12. 20.2 a 10.5 b 13.5 b 15.7 b 28.3 a 3.6 b 3.9 b 12.6 a 16.7 a absHRB 14.7 a 14.7 a 16.4 a 11.3 b 14.0 b 14.4 b 33.5 a 10.9 b 11.1 b 11.3 b 14.4 b 14.8 b 15.1 ab 33.5 a 38.3 ab 10.1 ab 14.4 b 16.3 ab 16.5 a es 8

b

sharing genome or transmission transmission or genome sharing ‡

6 340 140 20 316 109 35 31 21 44 384 N 207 165 80 24 312 120 24 35 55 24 120 168 42 24 6 14 319 80 61

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: according toKruskal datasetgroups Virus of480 virusspecies. letters sharing are notsignificantlydifferent ‡ † non * P Mean values range host of absolute breadth N: number ofvirus Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). - values of Kruskalvalues of - significant, p

- value<0.05, p value<0.05, species inthespecies group. - - Wallis significance arefollowedWallis by tests Wallis multiple comparisons (p Wallis Comment citer cedocument: 10.1099/jgv.0.000742

- value<0.01and p multiplying circulative and

(absHRB) for each virus group(absHRB) each based for virus ona

29

- value<0.001,respectively. - value>0.05). 14.6 a

ns , *,**,***when testsare

28

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: nestedness thanthe ormodularity ‡ † Probabilistic model; PD: degree model. hypothesesthem tonull statistical simulations) assessment for (1000 * species Table

P functionsR usedtoestimate nested Modularity Nestedness Analysis Models used to fill missing data in the infectivity matrix (100 simulations)Models missing used tofill datamatrix and intheinfectivity tocompare Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). - values correspondingto the frequencymodel ofnull 2 infectivity matrix .

S

tatistical s tatistical

Algorithm spinglass.community leading.eigenvector.community wine NODF2 binmatnest2 † ignificance of nestedness and modularity in modularity and nestedness of ignificance

† Comment citer cedocument:

.

10.1099/jgv.0.000742 †

filled

ness ormodularity. †

infectivity matrix. 30 †

1000 100 <10 0.0 0.0 <10 <10

21 78 B simulations simulations - - - 5 B 5 5

‡ ×

*

1000 100 0.0 0.0 <10 <10 <10 a

37 virus species × 28 plant 28 × species virus 37 08 49 B (over 10 PD - - - [18] 5 5 5

×

. B: Bernoulli × 100 5

) 0.0 0.0 <10 <10 <10 1000 showing higher 13 37 PD PD - - - 5 5 5

B

100 PD 100 PD 1000 0.0 0.0 <10 <10 <10 06 23 PD - - - 5 5 5

×

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: the average range host breadths indicated terminal node interpretedat the by top,following starting froma eachtoarrive branch each downnode, to categories: no seed, categories dsrna, dsdna variables ( properties. virus plant Fig. 1: b Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). ).

R elative hostrange breadth ( Conditional inferenceregression treeConditional were the genome naturegenome were ofviruses the : one, two,three,>3 one,

es ( ; see Methods a

. ). with four e For each terminal node, boxplots of each hostrange node,boxplots For represented terminal breadths are

A bsolute host range host bsolute breadth ( ) and the vectorand(a type Comment citer cedocument: xplanatory variables variables xplanatory

10.1099/jgv.0.000742 section ) , the mode of vertical , themodetransmission of relHRB) ) , their number of genome, their number segments of

.

using n: number of inthe virusspecies group.n: number r thropods weregrouped)thropods

( variable ‘ variable absHRB) 31 s

a data a

modelling the representingmajor biological virus set Explanatory of 293virusspecies. GEN

using a dataset of 480 virusspecies ausing datasetof ’; 4 categories: ssrna,categories: ssdna, 4 host rangehost breadth (HRB) ( variable‘VER’,

(VEC)

( variable ‘SEG’, 4 variable . T ree s

should be should

2

and .

of of Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI:

Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). Comment citer cedocument: 10.1099/jgv.0.000742

32

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: Vitales, Pinales,Charales,Cornales, Ericales, and Geraniales Myrtales Ranunculales,Buxales, Cycadales, Proteales, Polypodiales, Dioscoreales, Saxifragales, III,II,Asparagales, Euasterids and Commelinids, Liliales, given qorder Wallis order, m of groups Fig. Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

host specieshost 2 :

multiple

Comparison of thediversity of Comparison of eans eans

viruses. . of Taxonomic r Taxonomic averaged test. Hill numbersHill

M Plant tax ean of Comment citer cedocument: for allat virus species different 10.1099/jgv.0.000742 ank X Hill numbers of orders q=0 numbers toq=3correspondingHill oforders tothe

onomic levels were compared levels between planttaxonomic by was onplantphylogeny based and

of plant taxa plant of

sharingare different letters notsignificantly 33

in the in plant taxonomic plant taxonomic

host and host

Caryophyllales,Alismatales, /or contain

[46] non ranks - .

host s

Eurosids IEurosids and . For each q

a species Kruskal

diversity

for afor - Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI:

Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). Comment citer cedocument: 10.1099/jgv.0.000742

34

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: necrosisBYMV ( Anecrovirus), ringspot tombusvirus),PFBV ( TuMV ( potyvirus),vein mottle ( ArMV CarMV ( Chenopodium amaranticolor Spinacia oleracea Module 2contained thefollowing plantspecies: potyvirus)and ne BCTV ( ilarvirus), SLRSV ( ( species: faba Solanum tuberosum Module 1contained thefollowing plantspecies: families. identifying (delineatedgray partially three modules by lines), host of lines and (infection) correspond columns. Black to boxes andboxes hosts to white species matrix infectivity Fig. c Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). ucumbe crotic TBRV mosaic dianthovirus), ( s (noinfection). , 3 Vigna unguiculata :

Evidence of nestedness ( nestedness Evidence of b t BWYV (

urnip mosaic potyvirus), BtMVurnip mosaic potyvirus), ( c eet curly top curtovirus), PSV ( curlyeet PSV topcurtovirus), r mosaic cucumovirus), TRSV ( TRSV r mosaic cucumovirus), arnation carmovirus), mottle ( OkMV

ClYMV ( ClYMV b , eet western yellows AMVeet polerovirus), ( western Tetragonia Tetragonia s , trawberry latent sadwavirus), ringspot TRV( Gray Lactuca sativa Comment citer cedocument: , c Cucumis sativus

lover yellowlover mosaic potexvirus) boxes correspond boxes . 10.1099/jgv.0.000742 Thecorrespond two matrices thesame to dataset after permutation

and p tetragonioides a elargonium flower break flower elargonium carmovirus), TNV rabis mo rabis b a Chenopodium quinoa ean yellowean m ) and modularity) and ( ,

Brassica campestris t omato black ring nepovirus), SMV ( ringomato black SMV nepovirus), p saic nepovirus),HLV (

t b eanut cucumovirus), stunt ( RCNMV and obacco ringspot TSV ( nepovirus), eet mosaicpotyvirus),eet CymRSV ( to missinganalyses data.allowed Modularity ,

35 Phaseolus vulgaris Beta vulgaris osaic potyvirus),HVSosaic ( o

Trifolium repens Trifolium Cucurbita pepo Cucurbita kra mosaic tymovirus), CVMoV ( mosaickra tymovirus), CVMoV b ) in ) in , and thefollowingvirus species: , . Glycine max

a a ,

lfalfa mosaic alfamovirus), CMV lfalfa mosaic 37 virus species × 28 plant species37 virus ×28plant Gomphrena globosa associatedwith h , , eracleum latent vitivirus), Chenopodium album t ,

obacco rattle tobravirus),obacco Trifolium incarnatum and thefollowing virus , h Pisum sativum Pisum elenium carlaviruselenium - soybean mosaic A ( c t

obacco streak three plant ymbidium r ed clovered t obacco , c arnation arnation non , , Vicia , - Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: ( PTV ( ipomovirus),RMVmottle ( ( rustica clevelandiiNicotiana Module 3contained thefollowing plantspecies: ( ( PSbMV S), p t b obaccoCVBetch potyvirus), ( Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). hysalis mosaic tymovirus) eet necrotic yellownecrotic ) vein and p , eru potyvirus),PVMV( tomato mosaic Physalis floridana p ea seedborne mosaic potyvirus),( SqMV , Lycopersicon esculentum Comment citer cedocument:

10.1099/jgv.0.000742 and r . ibgrass mosaic tobamovirus), PopMV ( PopMV ibgrass mosaictobamovirus),

Datura stramonium c hrysanthemum B carlavirus), ( BSPMMV hrysanthemum

CPMMV ( CPMMV 36 pepper veinalpepper potyvirus) and mottle ,

Ni Petunia cotiana tabacum , and the following virus species:and thefollowing virus c owpea mild mottle carlavirus) mottle owpea mild s quash BNYVV mosaic comovirus),

x hybrida , p , Zinnia elegans oplar mosaic carlavirus), Nicotiana glutinosa Nicotiana s weet potato mild PhyMV , . Nicotiana Nicotiana

TEV , Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI:

Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). Comment citer cedocument: 10.1099/jgv.0.000742

37

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: of15 plant speciesminimum the relHRBwas only analyzed restricted for set the ( 15 plantspecies a to goodcompromise corresponds between theprecision of HRB estimation of remaining virusspecies species (~15 which plantspecies) theincrease beyond of betweenand absHRB. relHRB for toolow aof remainingis estimationthe coefficient precise virusspecies of correlation Note species of correlation between absHRBand relHRB, precision ofestimates. HRB A coefficientbetween ofabsHRBand correlation relHRBan istherefore indicator of the HRB estimates plant species.numberspecies ofassayed Whenthe plant (relHRB)relative host range breadth To take account number into ofplant assayed thespecies total for corresponding number of tothe reported total species host isanimprecise some the hostornon InVIDE thetotalnumber the database, of range host precision of breadth (HRB)estimatesviruses number remaining and of Method S1.  Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). =0.74) ofvirus species and inthe dataset (293 remaining thenumber . Consequently, however , corresponding to a , corresponding to increases andincreases corresponds to Determination of avirus compromise of agood subset Determination with the between that when the numberthat when of - ,

host status is known statusis host and absHRBand be and equivalent relHRB tendto ( for those latter viruses for Comment citer cedocument: in 10.1099/jgv.0.000742 change ofslopelinear ofthe regression the dataset Figure (see below) was assayed ccordingly,a

A minimal threshold threshold A minimal

)

,

a

varies greatlyvaries among virusspecies i.e. decreasing assayed

the assayed assayed (Table S1). TheS1). absHRBwas (Table analyzedthe 293 for

global the absolute host range host (absHRB),the absolute breadth absHRB divided by total number ofabsHRB the is observed is observed 38

plant species becomes high (>50), species the becomeshigh plant number

 number ofviruses

plant species

increase of became smaller per additional plant peradditional became smaller of virus of plant species was number noticea

increases, so does theprecision ofincreases, so does as es . Consequen the total number of the totalnumber

(293 species) for(293 whichspecies) a  , the Spearman’s coefficient, theSpearman’s ( see Figure below see i.e. each between between

remaining remaining plant species plant species

virus, we virus, species

tly, this threshold of this tly, and canand be for small estimate ofHRB.estimate 

and the number and thenumber ). Therefore, in the dataset. assayed used ). The ). The for whichfor assayed

the

. plant

ble - Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: double datasets,exhaustive especially underrepresented example virusgroups viruses (for for witha expect the more293 from precision virus datasetmore (Table also datasets but exhaustive forTable 1;Suppl. S1).Globally,we Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). - stranded genomestranded Comment citer cedocument: ) .

10.1099/jgv.0.000742

-

vi rus datasetbutmore for statistical power more 39

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: t (PPV) istheprobability avirus classified c that inclass classif that sensitivity istheprobability thatthemodel inclass classifies c ‘ eachabsHRB the a steps were The predictive 500times. the iterated by performancesummarized modelwas of with the numbers offalse positives, negatives, true and positives true negatives the 3class dataset determined (432species) and randomlythe remaining thevirus species out10%of fittedthe (48modelto left species), predictive the performance modelwas of evaluated fitted withthefunction “clm”ofthe“ordinal” packagesoftware inthe R 3.0.2. version anywithout interaction par interaction(most VER, GEN Class_absHRB ofthe fourpropertiesfunction as mainviral a ( foundsignificant analyzed performance ofa the with logit explaining cumulative link model link species). 165 W class hosts; species), 170 2(6 to 15hosts; class 3(>15 and threeinto equilibrated of classes increasing absHRB:virusspecies), class 145 1(1to5hosts; definedWe modelsTentative thep for Method rue result positive caret Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). belongs verage sensitivity, predictive positive value specificity, andfor value negativepredictive ’ packageversion 3.0.2software, inthe R y

in class c in

S 2. es ). The only effects model included themain ofthe

an ordinal variable, Class_absHRB, inthean variable, 480virusspecies database ordinal orderingthe

effectively

of absHRB of

class i

a virus that a virus s . These indices. These withthe were calculated

of the model).the probability The(NPV) value negative predictive is Comment citer cedocument: to ,

i.e.

class c 10.1099/jgv.0.000742 redicti

a contingency twodimension table“predicted”) and (“actual” is effectivelyclass c notin i . The istheprobability specificity thatthemodel do on

the confusion matrix ofthe 48 matrix the confusion

of of plant virusplant ameters couldameters be not estimated) 40

using using

by cross

host rangehost breadth the approach i

by indeed is inclass themodel c i . The value predictive positive ‘ - confusionMatrix validation tests. In validation tests,we tests. these se i

of absHRB

four "one versus level all"

left e xplanatory variables - out virus speciesout virus for

(1≤ e thenfitted and . ’ The model was

function of the i

VEC, SEG, ≤3)

. a virus

es These The

not . i

(i.e. The that Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: absHRB compared obtained with a totheclassifier ones random baseline ( model is 0.65 probabilityclassified thata inclass virus 1 model ( to estimate thepositivepredictive value prevalence ofthe and and table below frequentclass inthe dataset). class ofcorrect randomly)(proportion orchoosing themost obtained assignments 0.35by to ano ratecorrect of assignment of the model). a c virusnotclassified inclass Class 3:>15 Class Class 2: 6 to 15 hosts 5hosts 1to Class1: absHRB class Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

3 0.79 for class ,

respectively - does truly notbelong to (respectively information rate of 0.33assignmentscorrect (proportion byinformation of rate obtained a choosing i.e.

class randomly), indicated randomly), inbracketsclass thetable in below.

the . Focusing on thetwoextreme. Focusing classes,thespecificity on ofthe model

hosts

Over the 500 cross theOver 500 proportions oftrue andnegativeproportions positive true Accordingly, the results).

es three ). Sensitivitycancombined and measures be specificity

1

0.55

and

absHRB classes Comment citer cedocument: 0.35 0.35 0.3 Prevalence )

s . The probability thata virus 3

10.1099/jgv.0.000742

to , respectively

The de absHRB classes m bythe i

this classthis by class notin c themodel indeed is - validations realized, theoverallvalidations realized, ( accuracy

tailed analysis foris providedin analysis each absHRBclasstailed 0.47 0.6 0.36 Sensitivity ,

Model performance index for species host Model performance index (PPV) i.e. is 0.77 (respectivelyis 0.77 be These figuresshould 0.74). ) but its sensitivity (0.36) butits lower

(0.33) (or

(0.33) (0.33) their 41

3 and negativeand predictive value )

by themodel proportion the in

0.79 0.5 0.92 Specificity not classified1( inclass odel) was 0.48.

(0.67)

(0.67) (0.67) belongs truly to class 1 (or belongs trulytoclass 1

i.e.

cross i

(i.e. t a classifier choosing an 0.55 0.4 0.65 PPV It becompared should - and

validation validation (0.35) rue negativerue result

(0.35) (0.3) with

0.47 for class

i.e. (NPV was

or the the overall

3) by the

) high (0.92high data 0.74 0.69 0.77 NPV

of the of the sets

(0.65) (0.65) (0.7) the

es 1 3 , )

s

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: performance of correct bychoosinga obtained randomly). assignments class absHRB shouldbecompared Theseto ano (0.48). values the model was species),class 3( and familieshost variable Class_ 7.25 diversitygenera ofhost ord numbers oforderq=2, The rangingfrom 0.78. 0.68to (perfect discrimination). predictor.It ranges predictor thepackage using pROC. estimatedthe thearea under sensitivity compared toano used the same analysis T framework. absHRB defined variable binary a new ordinal wemodel performanceNext, testedgrouped. howthechangesand classes 3are We 2 if Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). ered variableClass_ ; 1 sameperformed analysesdiversity weregenera families the and (Hill topredict ofhost 6 0 species),class 3( and : class 1 (1 to 5 hosts; 145 virus species) 145virus class 1(1 to 5hosts; was was : class 1( are summarized inthe tables below.are summarized

slightly lower (0.45) for (0.45) slightly lower Hill low (0.3 - in fam from 0.5(no discrimination from formation rateformation of 0.5.T Hill

Hill with the following Hill : class 1( : Hill ) Comment citer cedocument:

. Theestimated AUC was fam The PPV andThe of0.73 andFinally PPV NPVwere 0.76,respectively.we fam

gen 10.1099/jgv.0.000742 gen

Receiver Operating Characteristic

> <

and

with the following

2.9; 1.19; 159 Hill Hill Hill 16 gen Th

gen he overall accuracy overall he 3 (TwoClass_absHRB) fam

is area (AUC) also(AUC) measuresis area the > 7.25;

Hill species). < , respectively; defined We Methods section). see the

virus species), class 2( three equilibrated classesthree equilibrated ofincreasing 3.33; 156 he specificity of the specificity model he of gen

42 16 and slightly (0.53) higher for

between of the2classes

4 The

and class 2 and three classes ofincreasing equilibrated to 0.73 with ato 0.73with 95% confidence interval

species

virus species),class 2( overall rateoverall correct assignment of of thismodel of - ). information rate of 0.33 (proportioninformation rate

with two Similarly weSimilarly definedtheordered (> 5 curve associated to this binarycurve tothis associated The 1.19 ≤

hosts; hosts; detailed detailed

classes ofincreasingclasses was

was 0. was Hill accuracy of 335 absHRB

3.33 ≤ high (0.9

fam species) Hill predictive 75 that should be75 thatshould

≤ 2.9 fam diversity of ) to1 Hill

than for ; 1 a . We then . We 5 binary ) but its ) butits gen 58 s

by

≤ Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI:

Class 3:>2.9 Class 2: Class Class1: Hill 3:> Class 2: Class Class1: Hill Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). fam gen

class class < < 3.33 to7.25 1.19 to2.9

1.19 3.33 7.25

Comment citer cedocument: 0.34 0.32 0.33 Prevalence 0.34 0.33 0.32 Prevalence 10.1099/jgv.0.000742

Model performance index for thediversityModel families performance ofhost index Model performanc

0.56 (0.33) 0.59 (0.33) 0.45 (0.33) Sensitivity 0.46 (0.33) 0.55 (0.33) 0.36 (0.33) Sensitivity 43

e index for thediversity genera ofhost e index 0.86 (0.67) 0.54 (0.67) 0.91 (0.67) Specificity 0.82 (0.67) 0.51 (0.67) 0.86 (0.67) Specificity

0.67 (0.33) 0.38 (0.33) 0.71 (0.33) PPV 0.57 (0.33) 0.36 (0.33) 0.55 (0.33) PPV

0.79 (0.67) 0.73 (0.67) 0.77 (0.67) NPV 0.75 (0.67) 0.69 (0.67) 0.74 (0.67) NPV

Version postprint HRBreportedorderingvirus groups ofwas 1. tothat ofthe similar in terms inTable different effect (p Kruskal * relativeare HRBs on columns Methods section evennessdistribution ofthese plantgenera speciesacross families ( or wereanalyzed corresponding (columns) requirementsof numbers todifferent terms ofass in S1.Robustness between host Table ofthe links range biological breadth (HRB)estimatesproperties five species virus and of NS: nosignificant (p NS: effect N transmission vector Kind of grouped) (arthropods Vector type transmission Vertical segments Genome type Genome Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI:

- - . Wallis Wallis

value<0.05) was detectedwas value<0.05) with Kruskal Absolute and

Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017).

 n multiple multiple 293 ≥15 ). As a consequence, the resulting datasetsconsequence,). As resulting com a the /     NS

*

the left andontheslashcharacter, respectively. right ofthe the Hill relative HRBrelative 34 test andthe different NS     gen 6

-

≥5 value>0.05) detectedwi was

Comment citer cedocument: Hill 2 10.1099/jgv.0.000742 NS     gen s 5

4 graywere for analyzed c

10

Hill virus groups differences significant virus showed simil - 164 gen NS NS Wallis Wallis    ≥

1

5

multiple Hill th Kruskal gen 100 NS NS    ≥ olumns and gaveIn indicated. andresults, results olumns similar for unless thatcase, absolute and prised varying numbers of (N).prised virusspecies numbers varying 20

test but notest but significant groups.were the differences virus However, observed between

- Wallis Wallis Hill and and Hill  gen 287 gen     n / NS . ≥15

≥5 multiple multiple

and 44

Hill

and and Hill fam  test; 2 gen n .     , respectively / 4 ≥15 NS ≥ 7

10

: aglobally significant (p effect a yed species ( plant

Hill ar tothosereported 1; inTable 3 fam NS     33 , corresponding number oforder 2 , tothe Hill ≥

2

Hill

The 2 fam NS     0 ≥ 4

n

4

absolute HRB ) and/or in terms of) and/or diversity and in terms

Hill 1 fam NS NS    43 - ≥ value<0.05) was detected with wasvalue<0.05) detected

5

 was analyzedwas :

a globally significant Hill . Different datasets . Different datasets fam 88 NS NS    ≥

6

for white for white and and Hill NS  ; see 2 fam n    / 58 ≥15 / NS  ≥

2

and and Hill 185 fam n      ≥15 ≥

4

Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: 293 comparisonsmultiple (p genera.generaVirus areto Kruskal lettersaccording sharing notsignificantly different † respecti * S2 Table N Mean values (absHRB) of or absolute host (relHRB) relative Begomovirus Carlavirus Comovirus Tymovirus Potyvirus Potexvirus Ilarvirus Nepovirus Virus genus Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). 293 - virus datasetconsidere were

and N vely. .

Comparison range host breadths of plant virus genera.of

480

: number of virusesgenus ofeach 12.3 b 14.1 b 16.0 ab 18.8 ab 18.8 ab 19.6 ab 28.4 ab 31.3 a (293 viruses) absHRB †

Comment citer cedocument:

-

value>0.05).Onlygenera containing12species virus at least inthe 10.1099/jgv.0.000742

d. 42% bc 37% c 54% abc 41% c 43% c 56% abc 71% ab 73% a (293 viruses)

relHRB

45

in 15 20 13 16 60 15 12 23 N

the 293 293

*

-

and 480 range breadth for different virus 9.1 b 10.8 b 15.1 ab 17.0 ab 15.5 b 14.7 ab 22.8 ab 28.7 a (480 viruses) absHRB - virus datasets,

29 31 14 18 79 23 16 26 N 480

*

- Wallis Wallis Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI: range breadth datasetwhich forwere (293species ≥15plantspecies assayed) with (absHRB) usingMCA dataset thewholewith a virus (480species) ofthetwo axes and MCA thepercentage variance indicated. of is by explained each axis explanatory for variables virus traits (MCA) as Fig. used Analysis sixplant MultipleCorrespondence of S1: Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). variable (relHRB) variables (relHRB)

included as illustrativevariable.included Comment citer cedocument: 10.1099/jgv.0.000742 virus host range host virus breadth.

included as illustrativevariableincluded

46

(b). usingtherestricted virus MCA

Variables were projectedonthe first bsolute hostrangebsolute breadth s.

absHRB and relative (a).

host host Version postprint Determinants ofhostspecies rangein plantviruses. JournalofGeneral Virology, 98(4),862-873. ,DOI:

Moury, B. (Auteurdecorrespondance), Fabre,F.,Hébrard, E.,Froissart, R.(2017). Comment citer cedocument: 10.1099/jgv.0.000742

47