Supplementary Figures and Tables
Hi-C reads Ligation site End 1 End 2
Alignment Alignment
>25bps Chimeric reads Multi-reads Multi-reads Uni-reads Uni-reads
Read ends pairing
Uniquely mapping read pairs Multi-mapping read pairs
>99
Unmapped reads Singleton reads Low quality Multi-reads
Valid fragment filtering d1 d1
d2 50 bps < d1 + d2 < 800 bps d1 + d2 >800 bps d2 <25k bps d1 + d2 < 50 bps >25k bps
Short-range contacts Valid read pairs Invalid alignments
End 1 End 2 End 1
End 2 End 1 End 2 Dangling end Self circle Religation
24 Supplementary Figure 1 mHi-C pipeline (Alignment - Read end pairing - Valid fragment filtering). 1. Read ends are aligned to reference genome separately allow- ing multi-reads and chimeric reads are rescued. 2. Read ends are paired by their read query names. Multi-reads form more than one read pair with the same read query name. Read ends that fail to align form either unmapped reads or singleton reads and are discarded. Multi-reads with ends aligning to more than 99 positions are regarded as low quality multi-reads and are excluded from the downstream analysis. 3. Vali- dation checking to filter short-range contacts and alignments far away from restriction enzyme recognition sites. Contacts residing within the same restriction fragment, i.e., dangling end or self circle, as well as adjacent fragments (religation) are discarded. The above three processing steps are applied to each read independently enabling parallel implementation.
25 Valid fragment filtering d1
d2 50 bps < d1 + d2 < 800 bps
>25k bps
Valid read pairs
Duplicate removal Uni-reads Multi-reads A
1 mismatch
2 mismatches
Multi-reads Multi-reads B
Genome binning
40Kb 40Kb 40Kb 40Kb 40Kb 40Kb 40Kb
Uni-bin pairs Multi-bin pairs Multi-reads reduced to Uni-bin pairs
mHi-C
Prob=0.9 0.1
40Kb 40Kb 40Kb 40Kb 40Kb 40Kb 40Kb
Uni-bin pairs Multi-reads reduced to Multi-bin pairs Uni-binpairs
Contact matrix Bin k Bin j Bin k Bin j 3 contact counts
26 Supplementary Figure 2 mHi-C pipeline (Duplicate removal - Genome binning - mHi-C). 4. PCR duplicates are removed to ensure that when a uni-read and a multi- read have the same alignment position and strand direction, the uni-read is kept. In the case of multi-reads that overlap with other multi-reads, the ones with alphabetically larger IDs are removed. 5. Genome is split into fix-sized non-overlapping intervals, i.e., bins or fixed number of restriction fragments and, as a result, read alignment position pairs are reduced to bin pairs. Multi-reads, candidate alignment positions of which fall into the same bin, are reduced to uni-bin pairs. 6. mHi-C model estimates an allocation probability for each potential contact and enables filtering of contacts by thresholding this allocation probability. 7. Uni-reads and thresholded multi-reads are utilized to construct contact matrix.
27 # of Reads a 2e+08 4e+08 6e+08 0e+00 gso ut-ed yai)frec aeoyaeas ipae ntpo ahbar. each of multi-reads. top of percentages on larger displayed percent- to also lead actual are reads The category chimeric expected, percentages. each As reads. for of chimeric (y-axis) terms multi-reads to in of displayed compared ages but reads Multi-reads (a) usable as multi-reads. the information represent of Same bars these percentage the of larger on percentage a shades what constitute Darker with along multi-reads. displayed are are sets rescue chimeric rescue. without to and processing with extra (IMR90). require reads multi-reads chimeric to and due reads Multi-reads 3 Figure Supplementary rep1 read end2.chime read end2.w/ochime read end1.chime read end1.w/ochime 10.68% 10.74% 12.14% 12.24%
rep2 9.89% 9.93% 11.35% 11.35%
rep3 10.78% r r ic reads ic reads 10.86% 11.03% r r ic reads ic reads 11.03%
rep4 9.64% 9.62% 10.74% 10.73%
rep5 9.6% 9.63% 10.67% 10.61%
rep6 10.31% 10.39% 10.82% 10.81% 28 b
Percentage of Multi-reads10 0 5
10.68% rep1 read end2.w/ochime read end1.w/ochime read end1.chime read end2.chime 10.74% 12.14% 12.24% 9.89% rep2
a. 9.93% 11.35% ubr fra ends read of Numbers 11.35% r r 10.78% ic reads rep3 ic reads 10.86%
r 11.03% r ic reads ic reads 11.03%
ohchimeric Both 9.64% rep4 9.62% 10.74% 10.73% 9.6% rep5 9.63% b. 10.67% 10.61% 10.31% rep6 10.39% 10.82% 10.81% a b c Rings Trophozoites Rep1
8.07% 1500 Count Count 13.21% 13.17% 13.10%
13.05% 1e+06 12.61% 12.60% 12.53% 12.53% 10000 12.41% 1e+04 12.26% 12.19% 12.13% 11.90% 8.97% 11.79% 11.73% 100 11.64% 1e+02 1e+00 1 6e+07 2000 10 9.48%
9.39% 1000 9.21% 8.97% 8.97% 8.45% 9.39% 8.32% 8.28% 8.07% 8.06% 9.21% 7.32% 7.32% 4e+07 1000 500
# of Reads 5
2e+07 8.06% 8.97% 13.05% 13.10% 0 0 13.17% 13.21% 12.53% 12.53% Percentage of Multi−mapping Reads 12.61% 12.60% 0 200 400 600 800 0 200 400 600 7.32% 8.32% 8.45% 9.48%
7.32% 0 8.28% 12.13% 12.19% 12.26% 11.79% 11.90% 11.64% 12.41% 11.73% 0e+00 Trophozoites Rep2 Schizonts AT_L1 AT_L2 AT_L1 AT_L2 RINGS_L1 GGG_L1 GGG_L2 AGGG_L1 AGGG_L2 A A RINGS_L1 Count SCHIZONTS_L1SCHIZONTS_L2 SCHIZONTS_L1SCHIZONTS_L2 4000 1e+06 Count 1e+06 ROPHOZOITES−XL−ROPHOZOITES−XL−ROPHOZOITES−XL−CCROPHOZOITES−XL−CC OPHOZOITES−XL−OPHOZOITES−XL−OPHOZOITES−XL−CCOPHOZOITES−XL−CC T T T T TR TR TR TR 1e+04 read end 1.w/o chimeric reads 1e+04 read end 1.w/o chimeric reads 1e+02 read end 2.w/o chimeric reads read end 2.w/o chimeric reads 30000 1e+02 1e+00 read end 1.chimeric reads 1e+00 read end 1.chimeric reads 3000 read end 2.chimeric reads read end 2.chimeric reads e Rings Trophozoites Rep1 Uni&Multi−mapping Bin−pair Contact Count 20000 Uni−setting Uni&Multi−setting Uni−setting Uni&Multi−setting 2000 5941 18654 17326 10000 1000
4296
0 0 0 500 1000 1500 0 1000 2000 Uni−mapping Bin−pair Contact Count 2407 d Rings Trophozoites Rep1 0.001 0.01 0.05 0.001 0.01 0.05