RAGEseq: Supplementary material Description: Supplementary figures and supplementary tables.

1 A B Loading of cells onto Single Cell 3’ Chip

GEM Generation and Barcoding

Post GEM-RT Cleanup

ow f Targetted Capture

Repeat cDNA Amplifcation

cDNA Amplifcation

Full-length cDNA library Standard 10X 3’ work 3’ 10X Standard Nanopore library Fragmentation Construction

Sequencing on Illumina Nanopore MinION Library Construction

Sequencing on Illumina NextSeq

Supplementary Figure 1. RAGE-Seq experimental protocol and computational pipeline. A) Detailed experimental workflow of RAGE-Seq. Single-cell suspensions are loaded onto 3’ library chips for the Chromium Single Cell 3’ Library according to the manufacturer’s recommendations (10X Genomics). Single cells are partitioned into Gel Beads in Emulsion (GEMs) with cell lysis and barcoded reverse transcription of RNA. Following isolation of GEMs from the Chromium instrument, full-length cDNA is PCR amplified. At this point the cDNA library is split into two fractions. The first fraction undergoes the remainder of the Chromium workflow which involves shearing, adapter ligation and Illumina sequencing. The second fraction undergoes two rounds of targeted capture of TCR and BCR followed by ONT library preparation using the 1D adapter ligation sequencing kit and sequencing on the MinION platform. (B) Detailed computational pipeline of RAGE-Seq.

2/10 A LYZ CD14 Jurkat reference TRA TRAV8-4 CDR3 TRAJ3

GCTGTGAGTGATCTCGAACCGAACAGCAGTGCTTCCAAGATAATC A V S D L E P N S S A S K I I TRB TRBV12-3 CDR3 TRBJ1-2

GCCAGCAGTTTCTCGACCTGTTCGGCTAACTATGGCTACACC A S S F S T C S A N Y G Y T CD3G CD79B

Ramos reference IGH VH4-34 CDR3 IGHJ6

GCGAGAGTTATTACTAGGGCGAGTCCTGGCACAGACGGGAGGTACGGTATGGACGTC A R V I T R A S P G T D G R Y G M D V IGL IGLV2-14 CDR3 IGLJ2

ACCTCATATACAAACGACAGCAATTCTCAGGTA T S Y T N D S N S Q V C Capture enrichment 50

40

30

20 On-target reads

10 On-target reads

0 Illumina sequencing Targetted capture + Nanopore sequencing

Cells

Supplementary Figure 2. RAGE-Seq cross-sequencing platform quality control measurements A) Reference CDR3, V and J segments that encode the TCRa and TCRb chains of Jurkat or the immunoglobulin heavy and light chains of Ramos. For Ramos the most abundant heavy chain CDR3 (1123/1278 cells) and light chain CDR3 sequences (104/937) were chosen as the reference. B) tSNE of key canonical gene expression markers used to identify cell type of cluster. CD3G was used to identify Jurkat cells, CD79B was used to identify Ramos cells and LYZ and CD14 were used to identify monocyte cells. (C) The relative enrichment of targeted capture of antigen genes. On-target reads were determined by the percentage of total Nanopore or Illumina sequencing reads that align to TRA, TRB, IGH, IGL and IGK constant region genes. (D) Nanopore cell barcode recovery for Nanopore reads that are on-target. Mean on-target reads per cell type: Jurkat, 309; Ramos, 646; Monocyte, 1.49.

3/10 A Cell type Jurkat Monocyte Ramos B

2000 Mean contigs/cell: 1500 15 Jurkat: 4.26 contigs/cell Ramos: 5.24 contigs/cell 1500 Monocyte: 0.12 contigs/cell 1000 1000 10 500 500 5 Contig frequency On-target reads 0 On-target reads 0 NR TCRα TCRβ TCR α+TCRβ No chain IgH IgL Paired 0 1000 2000 3000 Cells C D Assignment of TCR or BCR chains by TCR Heavy chain V and J gene segment usage No TCR No heavy chain 100 100 TRAV18 TRAV19- -TRAJ18 TRBV12-3 TRBV12-3- TRAJ53 1 TRAV17 -TRBJ2-1 TRBJ1-3 14 1 -TRAJ40 2 1 80 TRAV16- TRAJ22 43 s s TRAV8-4 TRBV12-3 l l 60 -TRAJ3 -TRBJ1-2 el el

c c

f f 50 o o

40 % % 472 856 20 TCRα Jurkat TCRβ Jurkat 0 0 IGLV2-23-IGLJ3 IGKV5-2-IGKJ2 0 0 0 0 IGHV4-31-IGHJ6 IGHV4-59-IGHJ6 0 1-5 6-10 11-20 21-40 >40 2 5 0 0 2 10 14 0-10 1 6 11- 21- > 51-1 IGLV2-14- # TRAC UMI (Illumina) / cell IGHV4-34 IGLJ2 IGHM UMIs (Illumina) / cell -IGHJ6 E TCR assignment Jurkat 1000 IgH + IgL 1278 TCRα 40 Jurkat IgL TCRβ IgH TCRα + TCRβ Monocytes TRA + TRB Heavy chain Light chain TRB 20 TRA Ramos Ramos NR F 0 200 400 600 # cells 0 non-reference non-productive tSNE_2 Ramos TCRα IgH + IgL 36 29 5 12 −20 IgL 6 19 IgH Ramos TRA + TRB TRB −20 0 20 TRA tSNE_1 NR 0 200 400 600 800 # cells BCR assignment Monocyte 437 437 467 40 IgL Jurkat IgH IgH + IgL IgL IgH + IgL Monocytes IgH 20 TRA + TRB Canu + Racon + Nanopolish TRB TRA NR 0 0 100 200 300 TCRβ tSNE_2 # cells 3 10 719 23 39

−20 IgH + IgL Ramos IgL IgH TRA + TRB −20 0 20 TRB tSNE_1 TRA NR

0 20 40 60 # cells 790 800 853

Canu + Racon + Nanopolish

Supplementary Figure 3. Quality control measurements of antigen receptor assembly. (A) Mean number of contigs assembled per cell. Each bar corresponds to an individual cell. (B) The number of on-target Nanopore reads for Jurkat (left panel) or Ramos (right panel) grouped by the recovery of TCR chains or BCR chains, respectively. NR, no receptor. Only those TCR and BCR chains that match their reference V and J gene and contain in-frame CDR3 sequences and lack stop codons, termed a productive clonotype, were assigned. (C) The recovery of Jurkat cells assigned a TCRa chain (left panel) or Ramos cells assigned a Immunoglobulin heavy chain (right panel) as a function of the number of Illumina UMIs TRAC (Jurkat) or IGHM (Ramos) genes per cell. (D) Assignment of TCR chains to Jurkat cells or BCR chains to Ramos cells based on their V and J gene segment usage. Shown in each pie graph is the number of cells expressing the designated V and J genes. Only productive clonotypes are assigned. (E) tSNE plot of Jurkat, Ramos and monocyte cells (Fig. 2A) assigned TCR (top panel) or BCR (bottom panel) chains. Right panels show the total number of cells assigned different chains for each cell type. Doublets (n=136 cells) are not shown on the tSNE plots and were filtered out based on high gene count (see Methods). (F) Accuracy of CDR3 sequences of Jurkat cells at each stage of contig assembly and polishing. Shown are the number of cells assigned a CDR3 sequence that match the reference TRA or TRB CDR3. ‘Non-reference’ refers to a CDR3 sequence that does not match the reference Jurkat CDR3. ‘Non-productive’ refers to TCR chains with a CDR3 sequence that is out-of-frame or contains stop-codons and are usually filtered from the dataset. Only TCR chains that match the Jurkat reference V and J gene segments are assigned. 4/10 4A 1259/1278 cells with full length IGHV 988/1000 cells with full-length IGLV 285 270 280 268 275 266 nucleotide length nucleotide length 270 264 265 IGLV2-14 IGHV4-34 262 260

0 200 400 600 800 1000 1200 0 200 400 600 800 1000

4B Jurkat TRAV Jurkat TRBV

Cell Cell ID ID

# cells with TRAV mutations: 25/476 # cells with TRBV mutations: 23/856

Supplementary Figure 4. Quality control measurements of calling somatic hypermutation in individual cells. A) Nucleotide length of Ramos heavy chain (left) and light chain (right) V regions. The maximum length of the entire IGVH4-34 (left) or IGLV2-13 (right) gene is indicated. (B) Heatmap of the V regions of individual Jurkat cells encoding the TCRa (right) and TCRb (left) chain. Each row represents an individual cell and each column a nucleotide position in the respective V gene. Light blue represents synonymous nucleotide substitutions while dark blue represents non-synonymous nucleotide substitutions, when compared to germline TRAV8-4 and TRBV12-3 sequences.

5/10 CD79A IGHD CD3E CD8 T−cell(EFF) / NKT: 405/405 A TRAC B CD4 T−Cell 1 (CM): 1096/1096 CD4 T−Cell 2 (CM): 226/226 CD4 T−Cell (EM): 1069/1069 γδ T−Cell / ILC / NK: 144/144 B−Cell Naive : 853/853 CD4 TFH: 142/142 B−Cell Memory: 738/738 Doublet: 86/86 CD4 T reg: 740/740 Epithelial: 13/13 CD4 CD8A NKG7 TRBC1 CD8 T−cell (CM): 487/487 Plasmablast : 28/28

10000

TRGC2 TRDC IL2RA FOXP3

100

CTLA4 PDCD1 CD40LG CCR7 ot