Supplementary Figures and Tables

Hi-C reads Ligation site End 1 End 2

Alignment Alignment

>25bps Chimeric reads Multi-reads Multi-reads Uni-reads Uni-reads

Read ends pairing

Uniquely mapping read pairs Multi-mapping read pairs

>99

Unmapped reads Singleton reads Low quality Multi-reads

Valid fragment filtering d1 d1

d2 50 bps < d1 + d2 < 800 bps d1 + d2 >800 bps d2 <25k bps d1 + d2 < 50 bps >25k bps

Short-range contacts Valid read pairs Invalid alignments

End 1 End 2 End 1

End 2 End 1 End 2 Dangling end Self circle Religation

24 Supplementary Figure 1 mHi-C pipeline (Alignment - Read end pairing - Valid fragment filtering). 1. Read ends are aligned to reference genome separately allow- ing multi-reads and chimeric reads are rescued. 2. Read ends are paired by their read query names. Multi-reads form more than one read pair with the same read query name. Read ends that fail to align form either unmapped reads or singleton reads and are discarded. Multi-reads with ends aligning to more than 99 positions are regarded as low quality multi-reads and are excluded from the downstream analysis. 3. Vali- dation checking to filter short-range contacts and alignments far away from restriction enzyme recognition sites. Contacts residing within the same restriction fragment, i.e., dangling end or self circle, as well as adjacent fragments (religation) are discarded. The above three processing steps are applied to each read independently enabling parallel implementation.

25 Valid fragment filtering d1

d2 50 bps < d1 + d2 < 800 bps

>25k bps

Valid read pairs

Duplicate removal Uni-reads Multi-reads A

1 mismatch

2 mismatches

Multi-reads Multi-reads B

Genome binning

40Kb 40Kb 40Kb 40Kb 40Kb 40Kb 40Kb

Uni-bin pairs Multi-bin pairs Multi-reads reduced to Uni-bin pairs

mHi-C

Prob=0.9 0.1

40Kb 40Kb 40Kb 40Kb 40Kb 40Kb 40Kb

Uni-bin pairs Multi-reads reduced to Multi-bin pairs Uni-binpairs

Contact matrix Bin k Bin j Bin k Bin j 3 contact counts

26 Supplementary Figure 2 mHi-C pipeline (Duplicate removal - Genome binning - mHi-C). 4. PCR duplicates are removed to ensure that when a uni-read and a multi- read have the same alignment position and strand direction, the uni-read is kept. In the case of multi-reads that overlap with other multi-reads, the ones with alphabetically larger IDs are removed. 5. Genome is split into fix-sized non-overlapping intervals, i.e., bins or fixed number of restriction fragments and, as a result, read alignment position pairs are reduced to bin pairs. Multi-reads, candidate alignment positions of which fall into the same bin, are reduced to uni-bin pairs. 6. mHi-C model estimates an allocation probability for each potential contact and enables filtering of contacts by thresholding this allocation probability. 7. Uni-reads and thresholded multi-reads are utilized to construct contact matrix.

27 # of Reads a 2e+08 4e+08 6e+08 0e+00 gso ut-ed yai)frec aeoyaeas ipae ntpo ahbar. each of multi-reads. top of percentages on larger displayed percent- to also lead actual are reads The category chimeric expected, percentages. each As reads. for of chimeric (y-axis) terms multi-reads to in of displayed compared ages but reads Multi-reads (a) usable as multi-reads. the information represent of Same bars these percentage the of larger on percentage a shades what constitute Darker with along multi-reads. displayed are are sets rescue chimeric rescue. without to and processing with extra (IMR90). require reads multi-reads chimeric to and due reads Multi-reads 3 Figure Supplementary rep1 read end2.chime read end2.w/ochime read end1.chime read end1.w/ochime 10.68% 10.74% 12.14% 12.24%

rep2 9.89% 9.93% 11.35% 11.35%

rep3 10.78% r r ic reads ic reads 10.86% 11.03% r r ic reads ic reads 11.03%

rep4 9.64% 9.62% 10.74% 10.73%

rep5 9.6% 9.63% 10.67% 10.61%

rep6 10.31% 10.39% 10.82% 10.81% 28 b

Percentage of Multi-reads10 0 5

10.68% rep1 read end2.w/ochime read end1.w/ochime read end1.chime read end2.chime 10.74% 12.14% 12.24% 9.89% rep2

a. 9.93% 11.35% ubr fra ends read of Numbers 11.35% r r 10.78% ic reads rep3 ic reads 10.86%

r 11.03% r ic reads ic reads 11.03%

ohchimeric Both 9.64% rep4 9.62% 10.74% 10.73% 9.6% rep5 9.63% b. 10.67% 10.61% 10.31% rep6 10.39% 10.82% 10.81% a b c Rings Trophozoites Rep1

8.07% 1500 Count Count 13.21% 13.17% 13.10%

13.05% 1e+06 12.61% 12.60% 12.53% 12.53% 10000 12.41% 1e+04 12.26% 12.19% 12.13% 11.90% 8.97% 11.79% 11.73% 100 11.64% 1e+02 1e+00 1 6e+07 2000 10 9.48%

9.39% 1000 9.21% 8.97% 8.97% 8.45% 9.39% 8.32% 8.28% 8.07% 8.06% 9.21% 7.32% 7.32% 4e+07 1000 500

# of Reads 5

2e+07 8.06% 8.97% 13.05% 13.10% 0 0 13.17% 13.21% 12.53% 12.53% Percentage of Multi−mapping Reads 12.61% 12.60% 0 200 400 600 800 0 200 400 600 7.32% 8.32% 8.45% 9.48%

7.32% 0 8.28% 12.13% 12.19% 12.26% 11.79% 11.90% 11.64% 12.41% 11.73% 0e+00 Trophozoites Rep2 Schizonts AT_L1 AT_L2 AT_L1 AT_L2 RINGS_L1 GGG_L1 GGG_L2 AGGG_L1 AGGG_L2 A A RINGS_L1 Count SCHIZONTS_L1SCHIZONTS_L2 SCHIZONTS_L1SCHIZONTS_L2 4000 1e+06 Count 1e+06 ROPHOZOITES−XL−ROPHOZOITES−XL−ROPHOZOITES−XL−CCROPHOZOITES−XL−CC OPHOZOITES−XL−OPHOZOITES−XL−OPHOZOITES−XL−CCOPHOZOITES−XL−CC T T T T TR TR TR TR 1e+04 read end 1.w/o chimeric reads 1e+04 read end 1.w/o chimeric reads 1e+02 read end 2.w/o chimeric reads read end 2.w/o chimeric reads 30000 1e+02 1e+00 read end 1.chimeric reads 1e+00 read end 1.chimeric reads 3000 read end 2.chimeric reads read end 2.chimeric reads e Rings Trophozoites Rep1 Uni&Multi−mapping Bin−pair Contact Count 20000 Uni−setting Uni&Multi−setting Uni−setting Uni&Multi−setting 2000 5941 18654 17326 10000 1000

4296

0 0 0 500 1000 1500 0 1000 2000 Uni−mapping Bin−pair Contact Count 2407 d Rings Trophozoites Rep1 0.001 0.01 0.05 0.001 0.01 0.05

1553 300 280 3294 260 766 80 1968 1553 240 220 155 155 200 60 180

Trophozoites Rep2 Schizonts 160 140 Uni−setting Uni&Multi−setting Uni−setting Uni&Multi−setting 120 4381 40 2274 100 80

60 20 40 1725 20 1553 0 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 2446 2202 Trophozoites Rep2 Schizonts 0.001 0.01 0.05 0.001 0.01 0.05

1553 180 25 651 160

140 20 268 102 155 155 120 100 Change in the Number of Significant Contacts (*100%) 15 80 Uni (FDR 1%) 60 Uni.Specific (Uni&Multi FDR 1%) 40 10 Uni.Specific (Uni&Multi FDR 10%) Uni&Multi (FDR 1%) 20 Uni&Multi.Specific (Uni FDR 1%) Gain 0 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 Uni&Multi.Specific (Uni FDR 10%) Loss 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 Multi−mapping Reads Posterior Probability Thresholding

29 Supplementary Figure 4 Multi-reads due to chimeric reads and improvement in the number of significant contacts due to multi-reads (P. falciparum). a, b Same as Supplementary Figure. 3a, b, but for P. falciparum. c. mHi-C leads to im- proved bin coverage by the Uni&Multi-setting compared to Uni-setting across all the P. falciparum samples. Dashed line is y = x. d. Percentage change in the numbers of significant contacts: red and blue depict gain and loss of Uni&Multi-setting compared to the Uni-setting, respectively. e. Recovery of significant contacts identified at FDR 1% by analysis at FDR 10%. Uni&Multi-setting. Specific (Uni FDR 10%) is the set of significant contacts identified at 1% FDR by the Uni&Multi-setting but are still unrecov- erable by the Uni-setting even with a liberal FDR of 10%. More detailed explanation is provided in Supplementary Figure 10.

30 b a 83.70 MB 65.20 MB 93.00 MB 74.50 MB 65.20 MB 74.50 MB 30 30 ne n-adUiMlistig o ein h2150000120000( chr2:105,000,000-142,000,000 ( regions chr3:65,200,000-83,700,000 for Uni&Multi-settings incorporat- and after Uni- in under filled are matrices contact (IMR90). in multi-reads Gaps ing 5 Figure Supplementary Uni-setting Uni-setting 3 otc arcso M9 obndrpiae (rep1-rep6) replicates combined IMR90 of matrices Contact 93.00 MB 83.70 MB b ).

Chromosome 3 Chromosome 1 31

83.70 MB 65.20 MB 93.00 MB 74.50 MB 65.20 MB 74.50 MB 30 30 Uni&Multi-setting Uni&Multi-setting Chromosome 3 Chromosome 1 a 93.00 MB 83.70 MB and )

Chromosome 3 Chromosome 1 b a 43.50 MB 25.00 MB 136.05 MB 117.55 MB 117.55 MB 25.00 MB upeetr iue6Gp ncnatmtie r le natrincorpo- after in filled are ( matrices 136,050,000 contact in (IMR90). Gaps multi-reads rating 6 Figure Supplementary 30 30 Uni-setting Uni-setting Chromosome 5 Chromosome 4 a n h52,0,0-35000( chr5:25,000,000-43,500,000 and ) 136.05 MB 43.50 MB xml otc arcsaefo h4 117,550,000- chr4: from are matrices contact Example

Chromosome 5 Chromosome 4 32

43.50 MB 25.00 MB 136.05 MB 117.55 MB 117.55 MB 25.00 MB 30 30 b ). Uni&Multi-setting Uni&Multi-setting Chromosome 5 Chromosome 4 136.05 MB 43.50 MB

Chromosome 5 Chromosome 4 a b 99.00 MB 80.50 MB 35.80 MB 17.30 MB 80.50 MB 17.30 MB upeetr iue7Gp ncnatmtie r le natrincorpo- after in filled are ( matrices 35,800,000 contact (IMR90). in multi-reads Gaps rating 7 Figure Supplementary 50 30 Uni-setting Uni-setting a Chromosome X Chromosome 6 n hX8,0,0-90000( chrX:80,500,000-99,000,000 and ) 35.80 MB 99.00 MB xml otc arcsaefo chr6:17,300,000- from are matrices contact Example

Chromosome X Chromosome 6 33

99.00 MB 80.50 MB 35.80 MB 17.30 MB 17.30 MB 80.50 MB b 50 30 ). Uni&Multi-setting Uni&Multi-setting Chromosome X Chromosome 6 99.00 MB 35.80 MB

Chromosome X Chromosome 6 a IMR90(rep1) IMR90(rep2) IMR90(rep3) 0.5 0.5 0.5 12.5 12 12

10.0 9 9

7.5 6 6 Count 5.0 1e+06 3 3 2.5 1e+04 1e+02

0.0 0 0 1e+00 0.0 2.5 5.0 7.5 0 2 4 6 8 0.0 2.5 5.0 7.5 IMR90(rep4) IMR90(rep5) IMR90(rep6) 0.5 0.5 0.5 12

9 9 9

6 6 6

3 3 3 log(Uni&Multi−mapping Bin−pair Contact Count)

0 0 0 0.0 2.5 5.0 7.5 0 2 4 6 8 0 2 4 6 8 log(Uni−mapping Bin−pair Contact Count) IMR90(rep1) IMR90(rep2) IMR90(rep3) b 0.9 0.9 0.9 12 12 12

9 9 9

6 6 6 Count 1e+06 3 3 3 1e+04 1e+02 0 0 0 1e+00 0.0 2.5 5.0 7.5 0 2 4 6 8 0.0 2.5 5.0 7.5 IMR90(rep4) IMR90(rep5) IMR90(rep6) 0.9 0.9 0.9 12

9 9 9

6 6 6

3 3 3 log(Uni&Multi−mapping Bin−pair Contact Count) 0 0 0 0.0 2.5 5.0 7.5 0 2 4 6 8 0 2 4 6 8 log(Uni−mapping Bin−pair Contact Count) 34 Supplementary Figure 8 Bin coverage improvement of Uni&Multi-setting com- pared to Uni-setting for IMR90 at the individual replicate level for two different allocation probability thresholds. Uni&Multi-setting for (a) includes multi-reads with posterior probability > 0.5, whereas (b) depicts more strict filtering with an allocation probability greater than 0.9.

35 Rep1 Rep2 Rep3 FDR(0.001) FDR(0.01) FDR(0.05) FDR(0.001) FDR(0.01) FDR(0.05) FDR(0.001) FDR(0.01) FDR(0.05) 30 Gain Loss 30 30 20 20 20

10 10 10

0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 Rep4 Rep5 Rep6 FDR(0.001) FDR(0.01) FDR(0.05) FDR(0.001) FDR(0.01) FDR(0.05) FDR(0.001) FDR(0.01) FDR(0.05) 70 30 70

60 60 50 50 20 40 40 30 30

10 20 20

Percentage Change in the Number of Significant Contacts 10 10

0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.90.5 0.6 0.7 0.8 0.9 Multi−mapping Reads Posterior Probability Thresholding

Supplementary Figure 9 Comparison of significant contacts (IMR90). Percent- age change in the numbers of significant contacts gained (Red) and lost (Blue) by the Uni&Multi-setting compared to the Uni-setting across individual IMR90 replicates for varying FDR and allocation probability thresholds.

36 Uni−setting Uni&Multi−setting Uni-setting 1,077,861 (FDR 1%) Uni-setting.Specific 865,510 (Uni&Multi FDR 1%) Uni-setting.Specific (Uni&Multi FDR 10%) Uni&Multi-setting (FDR 1%) Uni&Multi-setting.Specific 283,040 (Uni FDR 1%) 159,301 Uni&Multi-setting.Specific 70,689 9,804 (Uni FDR 10%)

Supplementary Figure 10 Recovery of significant contacts identified at 1% FDR by analysis at 10% FDR, aggregated over all six replicates of IMR90 dataset. The detailed descriptions of the groups are as follows: Uni-setting (FDR 1%): # of significant contacts identified by the Uni-setting at 1% FDR. Uni&Multi-setting (FDR 1%): # of significant contacts identified by the Uni&Multi-setting at 1% FDR. Uni-setting.Specific (Uni&Multi FDR 1%): # of significant contacts identified by Uni- setting (FDR 1%) but not by Uni&Multi-setting at 1% FDR . Uni-setting.Specific (Uni&Multi FDR 10%): # of significant contacts identified by Uni- setting (FDR 1%) but not by Uni&Multi-setting at 10% FDR. Uni&Multi-setting.Specific (Uni FDR 1%): # of significant contacts identified by Uni&Multi- setting (FDR 1%) but not by Uni-setting at 1% FDR . Uni&Multi-setting.Specific (Uni FDR 10%): # of significant contacts identified by Uni&Multi- setting (FDR 1%) but not by Uni-setting at 10% FDR.

37 IMR90 rep1 (Threshold 0.5) IMR90 rep2 (Threshold 0.5) Uni−setting Uni&Multi−setting Uni−setting Uni&Multi−setting Uni-setting 234,654 247,014 (FDR 1%) Uni-setting.Specific (Uni&Multi FDR 1%) 186,392 198,644 Uni-setting.Specific (Uni&Multi FDR 10%) Uni&Multi-setting (FDR 1%) Uni&Multi-setting.Specific (Uni FDR 1%) 63,963 62,867 Uni&Multi-setting.Specific (Uni FDR 10%) 32,389 31,310 15,701 14,497 1,569 1,805

IMR90 rep3 (Threshold 0.5) IMR90 rep4 (Threshold 0.5) Uni−setting Uni&Multi−setting Uni−setting Uni&Multi−setting 214,053 306,674

173,575 257,983

57,282 69,740 36,004 43,796 16,804 21,049 2,753 3,375

IMR90 rep5 (Threshold 0.5) IMR90 rep6 (Threshold 0.5) Uni−setting Uni&Multi−setting Uni−setting Uni&Multi−setting 29,961 45,505

19,268 29,648

11,695 17,493

11,243 4,559

1,002 60 1,636 242

Supplementary Figure 11 Recovery of significant contacts identified at FDR 1% by analysis at FDR 10% for each of six replicates of IMR90. Uni&Multi-setting. Specific (Uni FDR 10%) is the set of significant contacts identified at 1% FDR by the Uni&Multi-setting but are still unrecoverable by the Uni-setting even with a liberal FDR of 10%.

38 a Uni−setting Uni&Multi−setting rep5 rep6

1.00

0.75 e Rate

v 0.50

ositi

P ue r

T 0.25

0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 False Positive Rate b rep5 rep6

1.00

0.75

0.50 Precision 0.25

0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Recall

Supplementary Figure 12 ROC and PR curves for replicates 5 and 6 of IMR90. Sets of ”True” interactions and ”True” non-interactions are defined by reproducible sig- nificant/insignificant contacts across replicate 1-4 of both Uni-setting and Uni&Multi- setting (See Methods). Significant contacts of replicates 5 and 6 are utilized to com- pare ROC and PR curves among the Uni- and Uni&Multi-settings.

39 Multi−reads Utilization Approaches Comparison (IMR90)

2.0e+08 Uni−Setting AlignerSelect al

v DistanceSelect o SimpleSelect mHi−C 1.5e+08

1.0e+08 airs after Duplicate Rem P alid Read

V 5.0e+07 Number of 0.0e+00 rep1 rep2 rep3 rep4 rep5 rep6

Supplementary Figure 13 Numbers of valid read pairs identified by different multi-reads allocation strategies at the individual replicate level (IMR90).

40 rep1 rep2 rep3

SimpleSelect mHi−C 2.0e−07 2.0e−07 1.5e−07

1.5e−07 1.5e−07

1.0e−07

1.0e−07 1.0e−07

5.0e−08 5.0e−08 5.0e−08

0.0e+00 0.0e+00 0.0e+00

rep4 rep5 rep6

3e−07

1.5e−07

2e−07 2e−07

1.0e−07

1e−07 1e−07 5.0e−08

0.0e+00 0e+00 0e+00 0.0e+00 2.5e+06 5.0e+06 7.5e+06 1.0e+07 0.0e+00 2.5e+06 5.0e+06 7.5e+06 1.0e+07 0.0e+00 2.5e+06 5.0e+06 7.5e+06 1.0e+07 Genomic Distance

Supplementary Figure 14 Comparison of genomic distances of significant con- tacts between SimpleSelect and mHi-C for six replicates of IMR90 (FDR < 0.001). SimpleSelect over-emphasizes the contact distance prior thus leads to assignment of more multi-reads to short-range positions.

41 Plasmodium Signifcant Contact Diference among Stages (FDR 0.001) PlasmodiumPlasmodium Signi Signficantfcant Contact Contact DifDiferenceerence among among Stages Stages (FDR (FDR 0.01) 0.01) Rings Trophozoites Schizonts Rings Trophozoites Schizonts 100 100

80 80

60 60

40 40

Trophozoites Schizonts Rings Schizonts Rings Trophozoites TrophozoitesSchizonts Rings Schizonts Rings Trophozoites Plasmodium Signifcant Contact Diference among Stages (FDR 0.05) Rings Trophozoites Schizonts 100 cant Contacts f cant of Signi Percentage 80

60

Uni-setting SimpleSelect 40 mHi C

Trophozoites Schizonts Rings Schizonts Rings Trophozoites

Supplementary Figure 15 Comparison of significant contacts among the three life stages of P. falciparum. Y-axis in each panel, namely rings, trophozoites and schizonts, depicts the percentage of contacts that are significant only in the panel condition compared to the other two conditions. Under varying FDR thresholds, mHi-C and Uni-setting tend to have similar percentages of differential interactions among ring - trophozoites - schizonts plasmodium life stages. In contrast, SimpleSelect tends to underestimate differential interactions due to over-emphasizing contact distance prior.

42 a IMR90 Significant Contacts Reproducibility at FDR 0.05 IMR90 Significant Contacts Reproducibility at FDR 0.1 rep1 rep2 rep3 rep1 rep2 rep3

75 75

50 50

25 25

0 0 rep2 rep3 rep4 rep5 rep6 rep1 rep3 rep4 rep5 rep6 rep1 rep2 rep4 rep5 rep6 rep2 rep3 rep4 rep5 rep6 rep1 rep3 rep4 rep5 rep6 rep1 rep2 rep4 rep5 rep6 rep4 rep5 rep6 rep4 rep5 rep6

75 75 Reproducibility (%) Reproducibility (%)

50 50

25 25

0 0 rep1 rep2 rep3 rep5 rep6 rep1 rep2 rep3 rep4 rep6 rep1 rep2 rep3 rep4 rep5 rep1 rep2 rep3 rep5 rep6 rep1 rep2 rep3 rep4 rep6 rep1 rep2 rep3 rep4 rep5 b IMR90 Significant Contacts Reproducibility at FDR 0.05 w.r.t Contact Distance

100

75

50

Reproducibility (%) 25

0

1M 2M 3M 5M 6M 7M 8M 9M 10M 200K 400K 600K 800K 1.2M 1.4M 1.6M 1.8M 2.2M 2.4M 2.6M 2.8M 3.2M 3.4M 3.6M 3.8M 4.0M 4.2M 4.4M 4.6M 4.8M 5.2M 5.4M 5.6M 5.8M 6.2M 6.4M 6.6M 6.8M 7.2M 7.4M 7.6M 7.8M 8.2M 8.4M 8.6M 8.8M 9.2M 9.4M 9.6M 9.8M Contact Distance ()

IMR90 Significant Contacts Reproducibility at FDR 0.1 w.r.t Contact Distance

100

75

50

Reproducibility (%) 25

0

1M 2M 3M 5M 6M 7M 8M 9M 10M 200K 400K 600K 800K 1.2M 1.4M 1.6M 1.8M 2.2M 2.4M 2.6M 2.8M 3.2M 3.4M 3.6M 3.8M 4.0M 4.2M 4.4M 4.6M 4.8M 5.2M 5.4M 5.6M 5.8M 6.2M 6.4M 6.6M 6.8M 7.2M 7.4M 7.6M 7.8M 8.2M 8.4M 8.6M 8.8M 9.2M 9.4M 9.6M 9.8M Contact Distance (Base Pair) Common compared Common compared to Uni−setting Specific Uni&Multi−setting Specific 43 to Uni-setting Uni&Multi-setting Supplementary Figure 16 Reproducibility of significant contacts (IMR90). a. Significant contacts are classified into three categories: Uni-setting specific or Uni&Multi- setting specific or common to both. Reproducibility is evaluated by the percentage of significant contacts reproduced in another replicate within the same category. b. Re- producibility of significant contacts stratified by genomic distance. Reproducibility is evaluated by the percentage of significant contacts reproduced in another replicate within the same category and genomic distance range.

44 a 30

20

10

0 Average Number of Significant Contacts ar FDR 5%

7_Enh 6_EnhG 4_Tx 1_TssA 5_TxWk 9_Het 3_TxFlnk12_EnhBiv2_TssAFlnk13_ReprPC 11_BivFlnk10_TssBiv 15_Quies8_ZNF/Rpts b 14_ReprPCWk 20

15

10

5 Average Number of Significant Contacts ar FDR 5%

0 olII CTCF p300 P p65 Others H3K27ac H3K4me3 H3K4me1H3K36me3H3K27me3 Uni-setting.Specific Uni&Multi-setting.Specific Common

Supplementary Figure 17 Quantification of significant contacts for chromHMM states and ChIP-seq peak regions (IMR90). a. Grouping the significant contacts into three groups (Uni-setting specific, Uni&Multi-setting specific, Common to both set- tings) reveals the largest enrichment differences in chromHMM annotation categories related to repetitive regions, such as Zinc Finger & Repeats as well as Hete- rochromatin. b. Average number of significant contacts across regions with a variety of ChIP-seq signals. Red/green labels denote smaller/larger differences between Uni- setting.Specific and Uni&Multi-setting.specific compared to the differences observed in the ”Others” category that depicts non-peak regions.

45 chr1 (p36.21-p36.13) 1p31.1 1q12 32.1 1q41 q4344 Scale 1 Mb hg19 chr1: 16,500,000 17,000,000 17,500,000 18,000,000 40000 _ HiC_marginal_rep1-6_Uni-setting HiC_Uni 0 _ 40000 _ HiC_marginal_rep1-6_Uni&Multi-setting HiC_uniMulti 0 _ CTCF_uniMulti_peak CTCF_uniMulti_peak 40 _ CTCF_rep1_uniMulti CTCF_rep1_uniMulti 0 _ 40 _ CTCF_rep2_uniMulti CTCF_rep2_uniMulti 0 _ H3K27me3_uniMulti_peak H3K27me3_uniMulti_peak 40 _ H3K27me3_rep1_uniMulti H3K27me3_rep1_uniMulti 0 _ 40 _ H3K27me3_rep2_uniMulti H3K27me3_rep2_uniMulti 0 _ H3K27ac_uniMulti_peak H3K27ac_uniMulti_peak 40 _ H3K27ac_rep1_uniMulti H3K27ac_rep1_uniMulti 0 _ 40 _ H3K27ac_rep2_uniMulti H3K27ac_rep2_uniMulti 0 _ H3K36me3_uniMulti_peak H3K36me3_uniMulti_peak 40 _ H3K36me3_rep1_uniMulti H3K36me3_rep1_uniMulti 0 _ 40 _ H3K36me3_rep2_uniMulti H3K36me3_rep2_uniMulti 0 _ H3K4me1_uniMulti_peak H3K4me1_uniMulti_peak 40 _ H3K4me1_rep1_uniMulti H3K4me1_rep1_uniMulti 0 _ 40 _ H3K4me1_rep2_uniMulti H3K4me1_rep2_uniMulti 0 _ H3K4me3_uniMulti_peak H3K4me3_uniMulti_peak 40 _ H3K4me3_rep1_uniMulti H3K4me3_rep1_uniMulti 0 _ 40 _ H3K4me3_rep2_uniMulti H3K4me3_rep2_uniMulti 0 _ Basic Annotation Set from ENCODE/GENCODE Version 17 PLEKHM2 SPEN HSPB7 ANO7P1 SZRD1 U1 ESPNP U1 SDHB PADI1 RCC2 PLEKHM2 snoU13 EPHA2 SZRD1 NBPF1 MST1L CROCC Y_RNA PADI4 ARHGEF10L AL121992.1 SPEN EPHA2 SZRD1 NBPF1 RNU1-4 RNU1-2 SDHB PADI1 RCC2 ARHGEF10L RP11-288I21.1 ZBTB17 ARHGEF19 SZRD1 NBPF1 RP11-108M9.1 PADI2 PADI4 ARHGEF10L PLEKHM2 ZBTB17 ARHGEF19 SZRD1 NBPF1 RP11-108M9.2 PADI2 PADI1 RCC2 SLC25A34 ZBTB17 ARHGEF19 SZRD1 NBPF1 MIR3675 PADI2 AC004824.2 ARHGEF10L SLC25A34 ZBTB17 C1orf134 SPATA21 CROCCP2 RP11-108M9.3 PADI2 PADI4 ARHGEF10L RP11-169K16.4 ZBTB17 RSG1 NECAP2 RNU1-3 CROCC RP11-380J14.1 PADI6 ARHGEF10L TMEM82 C1orf64 FBXO42 RNU1-1 AL021920.2 MFAP2 PADI1 snoU13 TMEM82 RP11-5P18.5 FBXO42 AL355149.2 RP11-108M9.4 PADI1 ARHGEF10L FBLIM1 HSPB7 SPATA21 AL137798.1 MFAP2 PADI3 ARHGEF10L FBLIM1 HSPB7 SPATA21 AL021920.1 MFAP2 MIR3972 FBLIM1 HSPB7 SPATA21 MFAP2 RP1-20B21.4 FBLIM1 HSPB7 NECAP2 RP1-37C10.3 AC004824.1 FBLIM1 HSPB7 NECAP2 ATP13A2 FBLIM1 CLCNKA NECAP2 ATP13A2 RP11-169K16.9 CLCNKA NECAP2 ATP13A2 CLCNKA NECAP2 ATP13A2 CLCNKA RP4-798A10.2 CLCNKA CROCCP3 CLCNKB RP4-798A10.7 CLCNKB RP4-798A10.4 FAM131C RP5-875O13.1 FAM131C AL355149.1 RP11-276H7.2 RP11-276H7.3 ARHGEF19-AS1 10 _ RNA-seq_minusStrand RNA-seq_minus 0 _ 10 _ RNA-seq_plusStrand RNA-seq_plus 0 _

46 Supplementary Figure 18 Marginalized Hi-C signal (contact counts aggregated across the genomic coordinates for six replicates of IMR90), ChIP-seq coverage and peaks and gene expression for chr1:16,000,000-18,000,000. Highlighted in grey is a region with significantly different marginal Hi-C signal between Uni-setting and Uni&Multi-setting.

47 chr2 (q13-q14.1) 21 14 p12 13 14.3 31.1 34 35 Scale 1 Mb hg19 chr2: 113,500,000 114,000,000 114,500,000 115,000,000 40000 _ HiC_marginal_rep1-6_Uni-setting HiC_Uni 0 _ 40000 _ HiC_marginal_rep1-6_Uni&Multi-setting HiC_uniMulti 0 _ CTCF_uniMulti_peak CTCF_uniMulti_peak 40 _ CTCF_rep1_uniMulti CTCF_rep1_uniMulti 0 _ 40 _ CTCF_rep2_uniMulti CTCF_rep2_uniMulti 0 _ H3K27me3_uniMulti_peak H3K27me3_uniMulti_peak 40 _ H3K27me3_rep1_uniMulti H3K27me3_rep1_uniMulti 0 _ 40 _ H3K27me3_rep2_uniMulti H3K27me3_rep2_uniMulti 0 _ H3K27ac_uniMulti_peak H3K27ac_uniMulti_peak 40 _ H3K27ac_rep1_uniMulti H3K27ac_rep1_uniMulti 0 _ 40 _ H3K27ac_rep2_uniMulti H3K27ac_rep2_uniMulti 0 _ H3K36me3_uniMulti_peak H3K36me3_uniMulti_peak 40 _ H3K36me3_rep1_uniMulti H3K36me3_rep1_uniMulti 0 _ 40 _ H3K36me3_rep2_uniMulti H3K36me3_rep2_uniMulti 0 _ H3K4me1_uniMulti_peak H3K4me1_uniMulti_peak 40 _ H3K4me1_rep1_uniMulti H3K4me1_rep1_uniMulti 0 _ 40 _ H3K4me1_rep2_uniMulti H3K4me1_rep2_uniMulti 0 _ H3K4me3_uniMulti_peak H3K4me3_uniMulti_peak 40 _ H3K4me3_rep1_uniMulti H3K4me3_rep1_uniMulti 0 _ 40 _ H3K4me3_rep2_uniMulti H3K4me3_rep2_uniMulti 0 _ Basic Gene Annotation Set from ENCODE/GENCODE Version 17 POLR1B CKAP2L IL37 IL36RN PAX8 CBWD2 U6 snoU13 U3 snoU13 U2 POLR1B CKAP2L IL37 IL36RN PAX8 CBWD2 SLC35F5 AC010982.1 DPP10 POLR1B CKAP2L IL37 IL1F10 PAX8 CBWD2 SLC35F5 ACTR3 DPP10 POLR1B IL1A IL37 IL1F10 PAX8 FOXD4L1 MIR4782 ACTR3 POLR1B IL1B IL36G U6 PAX8 AC016745.1 RABL2A RP11-141B14.1 AC010982.2 POLR1B NT5DC4 IL37 IL1F10 PAX8 RP11-395L14.3 SLC35F5 AC110769.3 Y_RNA IL1B IL36G IL1RN IGKV1OR2-108 WASH2P AC024704.2 CHCHD5 AC079753.1 IL36A PSD4 AC016745.3 RPL23AP7 AC104653.1 CHCHD5 IL36B PSD4 RP11-395L14.4 ACTR3 CHCHD5 IL36B PSD4 FAM138B AC110769.1 AC012442.5 AC016724.8 MIR1302-3 AC012442.6 IL1RN RABL2A AC079922.3 IL1RN RABL2A SLC20A1 IL1RN RABL2A SLC20A1 IL1RN RABL2A NT5DC4 IL1RN RABL2A AC016683.5 RABL2A AC016683.5 AC017074.1 AC016683.6 AC017074.2 AC016683.6 AC016683.6 AC016683.6 AC016683.6 AC016683.6 AC016683.6 AC016683.6 AC016683.6 AC016683.6 RP11-65I12.1 AC016683.6 10 _ RNA-seq_minusStrand RNA-seq_minus 0 _ 10 _ RNA-seq_plusStrand RNA-seq_plus 0 _

48 Supplementary Figure 19 Marginalized Hi-C signal (contact counts aggregated across the genomic coordinates for six replicates of IMR90), ChIP-seq coverage and peaks and gene expression for chr2:113460,000-116,000,000. Highlighted in grey is a region with significantly different marginal Hi-C signal between Uni-setting and Uni&Multi-setting.

49 chr9 (q13) 24.19p23 p21.3 21.1 12 9q12 13 q31.1 3233.1 33.3 Scale 200 kb hg19 chr9: 66,300,000 66,400,000 66,500,000 66,600,000 66,700,000 66,800,000 66,900,000 40000 _ HiC_marginal_rep1-6_Uni-setting HiC_Uni 0 _ 40000 _ HiC_marginal_rep1-6_Uni&Multi-setting HiC_uniMulti 0 _ CTCF_uniMulti_peak CTCF_uniMulti_peak 40 _ CTCF_rep1_uniMulti CTCF_rep1_uniMulti 0 _ 40 _ CTCF_rep2_uniMulti CTCF_rep2_uniMulti 0 _ H3K27me3_uniMulti_peak H3K27me3_uniMulti_peak 40 _ H3K27me3_rep1_uniMulti H3K27me3_rep1_uniMulti 0 _ 40 _ H3K27me3_rep2_uniMulti H3K27me3_rep2_uniMulti 0 _ H3K27ac_uniMulti_peak H3K27ac_uniMulti_peak 40 _ H3K27ac_rep1_uniMulti H3K27ac_rep1_uniMulti 0 _ 40 _ H3K27ac_rep2_uniMulti H3K27ac_rep2_uniMulti 0 _ H3K36me3_uniMulti_peak H3K36me3_uniMulti_peak 40 _ H3K36me3_rep1_uniMulti H3K36me3_rep1_uniMulti 0 _ 40 _ H3K36me3_rep2_uniMulti H3K36me3_rep2_uniMulti 0 _ H3K4me1_uniMulti_peak H3K4me1_uniMulti_peak 40 _ H3K4me1_rep1_uniMulti H3K4me1_rep1_uniMulti 0 _ 40 _ H3K4me1_rep2_uniMulti H3K4me1_rep2_uniMulti 0 _ H3K4me3_uniMulti_peak H3K4me3_uniMulti_peak 40 _ H3K4me3_rep1_uniMulti H3K4me3_rep1_uniMulti 0 _ 40 _ H3K4me3_rep2_uniMulti H3K4me3_rep2_uniMulti 0 _ Basic Gene Annotation Set from ENCODE/GENCODE Version 17 RP11-262H14.1 RP11-262H14.4 U6 AL353626.1 RNA5SP283 RP11-318K12.2 AL353626.2 RP11-262H14.7 RP11-262H14.3 10 _ RNA-seq_minusStrand RNA-seq_minus 0 _ 10 _ RNA-seq_plusStrand RNA-seq_plus 0 _

Supplementary Figure 20 Marginalized Hi-C signal (contact counts aggregated across the genomic coordinates for six replicates of IMR90), ChIP-seq coverage and peaks and gene expression for chr9:66,250,000-66,950,000. Highlighted in grey is a region with significantly different marginal Hi-C signal between Uni-setting and Uni&Multi-setting.

50 19008

24mer Mappability ChromHMM GENCODE V19 genes Uni-setting Rep1 Uni-setting Rep2

Uni-setting Rep3

Uni-setting Rep4

Uni-setting Rep5 Uni-setting Rep6

Uni&Multi-setting Specifc Rep1

Uni&Multi-setting Specifc Rep2

Uni&Multi-setting Specifc Rep3

Uni&Multi-setting Specifc Rep4

Uni&Multi-setting Specifc Rep5

Uni&Multi-setting Specifc Rep6

Supplementary Figure 21 Significant enhancer-promoter interactions under Uni- and Uni&Multi-settings across 6 IMR90 replicates (Chromosome 17). This is the

51 individual replicate level data for main Fig. 2d. The corresponding color labels are con- sistent with ChromHMM 15-state model at http://egg2.wustl.edu/roadmap/ web_portal/chr_state_learning.html

52 Chr1

38901

24mer Mappability ChromHMM GENCODE V19 genes

Uni-setting

Uni&Multi-setting Specifc

Chr7

24mer Mappability ChromHMM GENCODE V19 genes Uni-setting

Uni&Multi-setting Specifc

Supplementary Figure 22 Examples of significant enhancer-promoter interac- tions reproducible among 6 replicates under Uni- and Uni&Multi-settings (IMR90). These regions are from 1 and 7.

53 60 Uni−setting Uni&Multi−setting Common

40

20 Percentage of Genes

0

0 (0, 1) [1, 5) [5, 10) [10, Inf) Gene Expression (TPM)

Supplementary Figure 23 Expression distribution of genes whose promoters have significant promoter interactions (IMR90). Genes harboring significant inter- actions (at 5% FDR) within their promoters are grouped into different gene expression categories.

54 a

200

150 # of TADs # of

100

50

chr1chr2chr3chr4chr5chr6chr7chr8chr9chr10chr11chr12chr13chr14chr15chr16chr17chr18chr19chr20chr21chr22chrX

b

200

150 # of TADs # of 100

50

rep1 rep2 rep3 rep4 rep5 rep6

Uni−setting Uni&Multi−setting

Supplementary Figure 24 The number of topologically associating domains (TAD) detected by for each chromosome under Uni-setting and Uni&Multi-setting (IMR90). a. Total number of TADs identified across six replicates for each chromo- some. b. Total number of TADs identified across 23 chromosomes for each replicate.

55 d a % of TADs with CTCF Peaks at Both Boundaries

% of Adjusted TADs with 87 88 89 90 Convergnet CTCF Motif at Boundaries 75 76 77 78 79 A onaisaeajse ne Uni pairs under adjusted motif reverse-forward Left are is boundaries Tandem TAD Divergent pairs. pairs. motif motif reverse-reverse to CTCF boundaries. refers forward-forward up- TAD of motif represents downstream strand similarly, motif forward strand Right, a reverse Tandem a with and those boundaries TAD are of pairs stream motif Convergent boundaries. TAD at boundaries. the at motifs a. aries. IMR90. of associ- topologically replicates at six Uni and peaks Uni-setting CTCF under boundaries of (TADs) domains Comparison ating 25 Figure Supplementary b. ecnae fTD hthv ohCC ek n ovretCTCF convergent and peaks CTCF both have that TADs of Percentages b

% of TAD with Convergent CTCF Motif 72 73 74 75 76 e

False Discovery Rate of TADs Detection c. 0 1 2 3 4 5 ecnae fTD hthv TFpasa bound- at peaks CTCF have that TADs of Percentages ecnae ffu ye fCC oisorientations motifs CTCF of types four of Percentages rep1 rep2 & ut-etn.Bxposdpc h percentage the depict plots Box Multi-setting. 56 c

rep3 % of CTCF Motif at TAD Boundaries 20 40 60 80 0 Co rep4 n v ergent rep5 T andem Right & rep6 ut-etn across Multi-setting T andem Left d. Uni&Multi−setting Uni−setting Some Di v ergent of adjusted TADs that have convergent CTCF motifs at the boundaries. j. False discov- ery rate of TADs detected under the two settings. TADs that are not reproducible and lack CTCF convergent motifs at the TAD boundaries are considered as false positives.

57 b a 43.25 MB 15.10 MB 33.45 MB 5.30 MB 15.15 MB 5.35 MB 40 40 Chromosome 9 Uni-setting Chromosome 6 Uni-setting 43.65 MB 33.85 MB 58 15.15 MB 5.35 MB 40 40 Uni&Multi-setting Uni&Multi-setting Chromosome 9 Chromosome 6 43.65 MB 33.85 MB

Chromosome 9 Chromosome 6 Supplementary Figure 26 Novel topologically associating domains (TADs) with CTCF peaks at TAD boundaries (IMR90). Gene tracks, 24mer mappability tracks as well as CTCF peaks are displayed above the contact matrices. a. Example on chr6:5,350,000-33,850,000. Even in the lack of obviously low mappable contact gaps, multi-reads can enhance the existing interaction signal and reveal detectable TAD structures supported by CTCF peaks. b. Example on chr9:15,150,000-43,650,000. TAD structure, supported by CTCF peaks at the TAD boundaries, becomes detectable as multi-reads fill in the gap in the contact matrix.

59 a b 28.15 MB 0.00 MB 94.90 MB 66.75 MB 66.80 MB 0.00 MB 40 40 Chromosome 5 Uni-setting Chromosome 1 Uni-setting 95.30 MB 28.50 MB 60 66.80 MB 0.00 MB 40 40 Uni&Multi-setting Uni&Multi-setting Chromosome 1 Chromosome 5 28.50 MB 95.30 MB

Chromosome 5 Chromosome 1 Supplementary Figure 27 Existing topologically associating domains (TADs) with adjusted boundaries supported by CTCF peaks at the new TAD boundaries (IMR90). a. An example from chr1:66,800,000-95,300,000. b. An example from chr5:0-28,500,000.

61 b a

71.30 MB 42.80 MB 25.00 MB 0.00 MB 42.80 MB 0.00 MB 50 40 Chromosome 13 Uni-setting Chromosome 12 Uni-setting 71.30 MB 25.00 MB 62 42.80 MB 0.00 MB 50 40 Uni&Multi-setting Uni&Multi-setting Chromosome 13 Chromosome 12 71.30 MB 25.00 MB

Chromosome 13 Chromosome 12 Supplementary Figure 28 Existing topologically associating domains (TADs) with adjusted boundaries supported by CTCF peaks at the new TAD boundaries (IMR90). a. An example from chr12:0-25,000,000. b. An example from chr13:42,800,000- 71,300,000.

63 b a 88.40 MB 60.25 MB 133.65 MB 105.50 MB 60.00 MB 105.60 MB 40 50 Uni-setting Chromosome 3 Uni-setting Chromosome 2 134.10 MB 88.50 MB 64 105.60 MB 60.00 MB 40 50 Uni&Multi-setting Uni&Multi-setting Chromosome 3 Chromosome 2 134.10 MB 88.50 MB

Chromosome 3 Chromosome 2 Supplementary Figure 29 False positive topologically associating domains (TADs) detected by the Uni-setting due to the missing reads in low mappabil- ity regions (IMR90). TADs that are splitted by white gaps are no longer detected once multi-reads are incorporated, indicating that they are highly likely false positives un- der the Uni-setting. a. Example on chr2:105,600,000-134,100,000. b. Example on chr3:60,000,000-88,500,000.

65 a b 28.50 MB 0.00 MB 28.15 MB 0.00 MB 0.00 MB 0.00 MB 50 40 Chromosome 16 Uni-setting Chromosome 4 Uni-setting 28.50 MB 28.50 MB 66 0.00 MB 0.00 MB 50 40 Uni&Multi-setting Uni&Multi-setting Chromosome 16 Chromosome 4 28.50 MB 28.50 MB

Chromosome 16 Chromosome 4 Supplementary Figure 30 False positive topologically associating domains (TADs) detected by the Uni-setting due to the missing reads in low mappabil- ity regions (IMR90). a. Example on chr4:0-28,500,000. b. Example on chr16:0- 28,500,000.

67 b a 117.70 MB 61.40 MB 42.85 MB 14.35 MB 60.80 MB 14.35 MB 80 40 Uni-setting Uni-setting Chromosome 21 Chromosome X 117.80 MB 42.85 MB 68 60.80 MB 14.35 MB 40 80 Uni&Multi-setting Uni&Multi-setting Chromosome 21 Chromosome X 117.80 MB 42.85 MB

Chromosome X Chromosome 21 Supplementary Figure 31 False positive topologically associating domains (TADs) detected by the Uni-setting due to the missing reads in low mappabil- ity regions (IMR90). a. Example on chr21:14,350,000-42,850,000. b. Example on chrX:60,800,000-117,800,000.

69 IMR90 (rep1) IMR90 (rep2)

1.00 1.00 SimpleSelect mHi−C

0.75 0.75

0.50 0.50 ior Probability r oste

P 0.25 0.25

0.00 0.00

IMR90 (rep3) IMR90 (rep4)

1.00 1.00

0.75 0.75

0.50 0.50 ior Probability r oste P 0.25 0.25

0.00 0.00

IMR90 (rep5) IMR90 (rep6)

1.00 1.00

0.75 0.75

0.50 0.50 ior Probability r oste P 0.25 0.25

0.00 0.00

Supplementary Figure 32 Comparison of the allocation probability distribu- tions between SimpleSelect and mHi-C for IMR90. Distribution of mHi-C allocation probabilities are displayed for both the mHi-C allocations (i.e., multi-read is assigned to the most likely position pair) and the locations assigned by SimpleSelect. SimpleS- elect ends up assigning many multi-reads to locations that are inferred to be unlikely by mHi-C.

70 Posterior Probability (SimpleSelect) Posterior Probability (SimpleSelect) Posterior Probability (SimpleSelect) 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 1.00 0.00 0.25 0.50 0.75 0.00 0.00 0.00 count count count 1e+03 1e+06 1e+03 1e+05 1e+03 1e+05 P 0.25 oste 0.25 0.25 r ior Probability(mHi−C) IMR90 (Rep1) IMR90 (Rep3) IMR90 (Rep5) 0.50 0.50 0.50 0.75 0.75 0.75 59.11% 68.99% 66.09% 1.00 1.00 1.00 71 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.00 0.00 count count count 1e+03 1e+06 1e+03 1e+05 1e+03 1e+05 0.25 0.25 0.25 P oste IMR90 (Rep2) IMR90 (Rep4) IMR90 (Rep6) r ior Probability(mHi−C) 0.50 0.50 0.50 0.75 0.75 0.75 67.36% 64.26% 65.91% 1.00 1.00 1.00 Supplementary Figure 33 Comparison of the allocation probabilities between SimpleSelect and mHi-C for IMR90. mHi-C allocation probabilities are displayed for both the mHi-C allocations (i.e., multi-read is assigned to the most likely position pair) and the locations selected by SimpleSelect. SimpleSelect assigns 60-70% of multi- reads to locations which are deemed less likely by mHi-C (points with x-axis > 0.5 and small y-axis).

72 Supplementary Table 1: Hi-C and mHi-C terminology. Standard Terminology Description Diagram mHi-C Pipeline

1a. Read ends Each end of a paired-end Uni- Uni&multi- alignment read is aligned independently. reads reads

Read end spans ligation site. Uni- Uni&multi- 1b. Chimeric They are rescued by trimming chimeric chimeric read rescue and re-alignment. ends ends Uni- 2a. Read end Aligned read ends form an Uni&multi- read pairing alignment position pair. read pairs pairs 2b. Multi-reads Valid uni- reduced to Only one of the alignment po- Valid and multi- uni-reads due sition pairs passes the valida- uni- reduced to validation tion checking. reads uni-reads checking Bin the genome by a fixed size window. Read end align- Uni- Uni&multi- 3a. Bin pair ments fall within bins and reads reads form bin pairs indicating con- binning binning tact from one bin to another. Uni- and 3b. Multi-reads Each read end has multi- Uni- multi- reduced to ple alignment positions falling reads reduced uni-reads due within the same bin, thus sup- bin uni-bin to binning porting the same bin pair. pairs pairs Each read pair alignment rep- Uni- resents one contact. Contact Uni&multi- 4. Contact mapping count for bin pair j and k is mapping count con- the total number of read pairs contacts tacts with ends in binj and bink.

Uni- Uni&Multi- The entry at row j and column 5. Contact ma- setting setting k indicates the contact count trix contact contact of binj interacting with bink. matrix matrix

73 Supplementary Table 2: Read ends alignment summary (IMR90, without chimeric reads) Replicate Uni&Multi-reads Multi-reads Multi-reads Percetage % read1 read2 read1 read2 read1 read2 Rep1 353,574,708 341,265,338 37,761,684 36,649,159 10.68% 10.74% Rep2 374,365,826 371,230,377 37,013,319 36,857,146 9.89% 9.93% Rep3 529,904,684 516,011,586 57,109,027 56,055,103 10.78% 10.86% Rep4 461,578,000 452,574,758 44,507,061 43,523,606 9.64% 9.62% Rep5 199,793,181 195,310,510 19,184,321 18,812,786 9.60% 9.63% Rep6 177,011,861 173,622,942 18,251,233 18,035,972 10.31% 10.39%

Supplementary Table 3: Read ends alignment summary - (P. falciparum, without chimeric reads)

Library Lane Uni&Multi-reads Multi-reads Multi-reads Percetage % read1 read2 read1 read2 read1 read2 RINGS L1 46,365,967 43,948,030 4,352,801 4,047,331 9.39% 9.21% TROPHOZOITES-XL-AGGG L1 8,129,229 8,050,531 1,018,830 1,008,770 12.53% 12.53% TROPHOZOITES-XL-AGGG L2 4,909,224 4,821,655 619,146 607,577 12.61% 12.60% TROPHOZOITES-XL-CCAT L1 13,349,742 13,329,941 1,742,105 1,746,269 13.05% 13.10% TROPHOZOITES-XL-CCAT L2 8,319,321 8,232,532 1,096,055 1,087,784 13.17% 13.21% SCHIZONTS L1 16,673,137 14,229,707 1,344,471 1,276,707 8.06% 8.97% SCHIZONTS L2 76,051,231 64,865,424 6,135,417 5,815,985 8.07% 8.97%

Supplementary Table 4: Read ends chimeric reads alignment (IMR90) Replicate Uni&Multi-reads Multi-reads Multi-reads Percentage% read1 read2 read1 read2 read1 read2 Rep1 4,280,256 4,077,870 519,580 499,065 12.14% 12.24% Rep2 13,497,407 13,438,736 1,532,001 1,525,700 11.35% 11.35% Rep3 16,871,415 17,202,224 1,860,637 1,896,696 11.03% 11.03% Rep4 14,789,771 15,579,019 1,589,062 1,671,639 10.74% 10.73% Rep5 6,672,000 6,201,642 711,907 657,987 10.67% 10.61% Rep6 6,956,254 7,065,126 752,798 763,882 10.82% 10.81%

Supplementary Table 5: Read ends chimeric reads alignment (P. falciparum) Library Lane Uni&Multi-reads Multi-reads Multi-reads Percetage % read1 read2 read1 read2 read1 read2 RINGS L1 1,452,714 1,411,267 122,701 133,846 8.45% 9.48% TROPHOZOITES-XL-AGGG L1 189,692 176,294 22,357 20,983 11.79% 11.90% TROPHOZOITES-XL-AGGG L2 130,980 101,837 15,240 11,950 11.64% 11.73% TROPHOZOITES-XL-CCAT L1 265,299 192,267 32,194 23,563 12.13% 12.26% TROPHOZOITES-XL-CCAT L2 200,404 112,554 24,424 13,966 12.19% 12.41% SCHIZONTS L1 661,885 516,572 48,453 42,754 7.32% 8.28% SCHIZONTS L2 2,976,218 2,470,641 217,868 205,473 7.32% 8.32%

Supplementary Table 6: Read pair alignment summary (IMR90) Replicate # of raw read pairs Uni-reads Multi-reads Multi/Uni % Rep1 397,187,388 236,251,615 56,152,873 23.77% Rep2 440,242,230 257,962,310 61,482,818 23.83% Rep3 621,089,009 378,586,153 81,461,564 21.52% Rep4 529,157,703 347,273,400 63,552,571 18.30% Rep5 234,133,577 148,849,946 26,735,106 17.96% Rep6 208,710,657 127,610,420 26,770,773 20.98%

Supplementary Table 7: Read pair alignment summary (P. falciparum) Library Lane # of raw read pairs # of mapped read pair Uni-reads Multi-reads Multi/Uni % RINGS L1 81,125,594 30,603,321 6,074,171 19.85% TROPHOZOITES-XL-AGGG L1 14,480,663 5,578,838 1,354,159 24.27% TROPHOZOITES-XL-AGGG L2 8,552,718 3,342,229 787,039 23.55% TROPHOZOITES-XL-CCAT L1 22,981,731 9,115,189 2,314,766 25.39% TROPHOZOITES-XL-CCAT L2 14,552,658 5,634,443 1,389,965 24.67% SCHIZONTS L1 19,475,471 10,395,697 1,778,232 17.11% SCHIZONTS L2 88,858,478 47,389,009 8,108,793 17.11%

74 Supplementary Table 8: Read pairs after validation checking and binning (IMR90) Replicate Multi-reads Improvement % Uni-mapping One valid read pair Unique bin pair Multiple bin pairs Uniquely determined/Uni Multiple bin-pairs/Uni Rep1 107,339,823 6,112,613 772,408 29,048,297 6.41% 27.06% Rep2 157,891,948 7,249,583 1,153,177 37,098,796 5.32% 23.50% Rep3 119,121,224 4,898,820 1,082,997 27,859,672 5.02% 23.39% Rep4 105,333,160 3,934,624 784,388 22,335,318 4.48% 21.20% Rep5 43,107,975 1,605,786 300,713 9,108,976 4.42% 21.13% Rep6 51,939,245 2,018,563 395,903 11,442,292 4.65% 22.03%

Supplementary Table 9: Read pair alignment summary after validation checking and binning (P. falciparum)

Library Lane Multi-mappign reads Improvement % Uni-reads One valid read pair Unique bin pair Multiple bin pairs Uniquely determined/Uni Multi-bin pair/Uni RINGS L1 15,915,127 598,778 567,293 3,163,160 7.33% 19.88% TROPHOZOITES-XL-AGGG L1 2,430,762 116,916 100,109 716,311 8.93% 29.47% TROPHOZOITES-XL-AGGG L2 1,194,807 62,599 48,078 399,272 9.26% 33.42% TROPHOZOITES-XL-CCAT L1 4,027,027 198,463 165,777 1,245,384 9.04% 30.93% TROPHOZOITES-XL-CCAT L2 1,995,602 110,887 81,338 706,385 9.63% 35.40% SCHIZONTS L1 4,861,554 171,245 172,103 958,216 7.06% 19.71% SCHIZONTS L2 22,134,416 782,662 781,152 4,371,377 7.07% 19.75%

Supplementary Table 10: Total numbers of significant enhancer-promoter interactions at FDR 5% (IMR90)

Replicate Uni&Multi-setting Uni-setting (Uni&Multi-Uni)/Uni % Rep1 72,536 62,094 16.82% Rep2 75,983 65,571 15.88% Rep3 79,523 70,426 12.92% Rep4 100,447 90,693 10.75% Rep5 13,364 10,706 24.83% Rep6 21,680 16,916 28.16% Sum 363,533 316,406 14.89%

Supplementary Table 11: Significant enhancer-promoter interactions reproducible among all six replicates (IMR90)

Chromosome Uni&Multi-setting Uni-setting (Uni&Multi-Uni)/Uni % chr1 1438 1273 12.96% chr2 1388 1250 11.04% chr3 926 839 10.37% chr4 505 393 28.50% chr5 819 723 13.28% chr6 858 776 10.57% chr7 623 498 25.10% chr8 537 499 7.62% chr9 607 546 11.17% chr10 680 614 10.75% chr11 723 676 6.95% chr12 650 585 11.11% chr13 279 241 15.77% chr14 390 356 9.55% chr15 626 541 15.71% chr16 334 293 13.99% chr17 698 585 19.32% chr18 249 211 18.01% chr19 155 106 46.23% chr20 298 270 10.37% chr21 217 197 10.15% chr22 245 207 18.36% chrX 68 41 65.85% Sum 13313 11720 13.59%

75 Supplementary Table 12: Significant enhancer-promoter interactions reproducible among at least two replicates (IMR90)

Chromosome Uni&Multi-setting Uni-setting (Uni&Multi-Uni)/Uni % chr1 6942 5781 20.08% chr2 5826 5235 11.29% chr3 4032 3608 11.75% chr4 2721 2271 19.82% chr5 3847 3357 14.60% chr6 4324 3777 14.48% chr7 2924 2349 24.48% chr8 2580 2323 11.06% chr9 2887 2516 14.75% chr10 3388 2926 15.79% chr11 3604 3206 12.41% chr12 3284 2888 13.71% chr13 1530 1349 13.42% chr14 1817 1573 15.51% chr15 2381 2082 14.36% chr16 1800 1478 21.79% chr17 2814 2372 18.63% chr18 1197 1062 12.71% chr19 995 710 40.14% chr20 1539 1367 12.58% chr21 813 757 7.40% chr22 1038 893 16.24% chrX 688 478 43.93% Sum 62971 54358 15.84%

76