<<

A

B

Figure S1: Length distribution and amino acid composition of KV-LCC10 proteins (A) Boxplots representing the length distribution of 83 proteins with a match in the NR database and a identified orthologue in KV-Sc, 113 hypothetical proteins with >60% identity with their KV-Sc orthologue and 136 hypothetical proteins with <60% identity with their KV-Sc orthologue. (B) Amino acid frequencies in the same 3 protein categories. 95 Fadolivirus 1 QKF93902.1 (VLTF3) Mimivirus LCMiAC02 QBK88965.1 (-) Edafosvirus sp. AYV78613.1 (VLTF3) 88 44 94 Terrestrivirus sp. AYV75824.1 (VLTF3) Dishui Lake large algae virus 1 QIG60165.1 (MCP) 91 KLCC10 0233 (MCP) 96 Faustovirus QJX71464.1 (MCP) 83 84 Faustovirus QKE50303.1 (MCP) 56 KLCC10 0232 (MCP) 93 Faustovirus QJX71244 (MCP) 73 95 Faustovirus QJX72970 (MCP) 74 Faunusvirus sp. AYV79604 (-) 12 K-Sc.BNJ 00233 (MCP) Faustovirus QKE50302.1(MCP) 39 K-Sc.BNJ 00111 (RNAP1) Edafosvirus sp. AYV78682.1(RNAP1) 40 saltans virus ATZ80520.1(RNAP1) 85 95 91 Catovirus CTV1 ARF09019.1(RNAP1) 49 KLCC10 0108 (RNAP1) Dishui Lake large algae virus 1 QIG60125.1(RNAP1) Homavirus sp. AYV82079.1(RNAP1) 99 Marseillevirus Shanghai 1 AVR52763.1(RNAP1) Edafosvirus sp. AYV78680.1(RNAP1) 97 Yasminevirus sp. GU-2018 VBB18352.1(RNAP1) 99 Klosneuvirus KNV1 ARF11178.1 (-) A. polyphaga mimivirus YP 003986930.1 (MCP) 85 39 Powai lake megavirus ANB50578.1(-) Yasminevirus sp. GU-2018 VBB18838.1(HSP70) A. polyphaga mimivirus AVG46232.1(HSP70) 83 99 Dishui Lake large algae virus 1 QIG60114.1(HSP70) 87 Catovirus CTV1 ARF08984.1(RNAP2) Moumouvirus maliensis QGR53921.1(MCP) 41 Klosneuvirus KNV1 ARF11369.1 (RNAP2) 93 90 Bathycoccus virus BpV2 ADQ91332.1(-) Pacmanvirus A23 YP 009361625.1(-) Pacmanvirus A23 YP 009361592.1(-) 96 100 92 Paenibacillus frigoriresistens WP 173186166.1(-) 38 K-Sc.BNJ 00235 (MCP) Chlorella virus NE-JV-1 AGE56314.1(-) 94 Terrestrivirus sp. AYV76036.1 (RNAP2) 21 Yasminevirus sp. GU-2018 VBB18381.1(RNAP2) 95 Harvfovirus sp. AYV81595.1 (RNAP2) Moumouvirus Monve AEX62994.1 (RNAP2) 88 92 Satyrvirus sp. AYV85568.1 (RNAP2) Bodo saltans virus ATZ80521.1 (RNAP2) Bodo saltans virus ATZ80317.1 (HSP70) 33 98 Fadolivirus 1 QKF93851.1 (HSP70)

0.50

Figure S2. phylogenetic reconstruction of homing endonucleases. The phylogenetic tree was reconstructed with FastTree using its default parameters. The SH-aLRT branch support is indicated beside each internal node. The scale bar represents the number of amino-acid substitution per site. The name of the gene haboring the intron-encoded nuclease resides is given between parentheses. (-) indicates that the endonuclease ORF is not inserted in an intron. 8E-06 A Duplicated genes 6E-06 Single-copy genes Genes absent in K-Sc 4E-06

2E-06

0 B of 4E-06 best match Prokaryote 3E-06 Virus

2E-06 Gene density

1E-06

1 100Kb 200Kb 300Kb Genomic coordinate

Fig. S3: Gene distribution in the K-LCC10 genome (A) Density distributions for duplicated genes, single-copy genes and genes not found in the K-Sc genome. (B) Density distributions according to the taxonomy of the gene best match in TrEMBL. Feuille1 Table S1 : average pairwise protein distances between 40 single-copy core gene families of extended Asfarviridae ) L ) / 9 M )

E

D 4 e 0

e 1 d 1 e . d a d l 2 C a l a 0 C l C ( C _ c s 3 9 L ( (

o

u 2 1 2 s v 6 3 1 B A A u . o

r M D E i s s s 6

t v u u 0 s s s n r r a i i i n u u u r r r b v v d i i i e e n n v v v K O o a a o o o t t t V V m s s s m m F F c c u u u u a a S S a a a a F F F P P A A K Faustovirus D3 (clade D) 0,12 Faustovirus E12 (Clade M/L) 0,21 0,21 Pacmanvirus A23 1,15 1,14 1,13 Pacmanvirus A19 1,15 1,14 1,13 0,04 ASFV Ken06.Bus 1,67 1,67 1,68 1,52 1,52 ASFV Odintsovo_02.14 1,67 1,67 1,68 1,52 1,52 0,04 Kaumoebavirus LCC10 1,73 1,72 1,73 1,61 1,62 1,76 1,76 Kaumoebavirus Sc 1,74 1,73 1,74 1,62 1,62 1,77 1,77 0,11

Information : 40 single-copy core protein families were identified in 33 fully sequenced extended-Asfarviridae (16 Fautoviruses, 13 Asfarviruses, 2 Pacmanviruses, 2 Kaumoebaviruses) using OrthoFinder. A multiple alignment of each protein family was generated with MAFFT. After removing columns with gaps, the 40 multiple alignments were concatenated into a single one containing 15,329 amino acid sites available for pairwise distance calculation. Pairwise amino acid distances were calculated using the codeml program and the WAG subsitution model. This table only showcases 3 Fautovirus representatives of clades E9, D and M/L and 2 ASFV representatives that had the largest protein distances between all ASFV genomes.

Page 1