US 20210005284A1 IN ( 19 ) United States ( 12 ) Patent Application Publication ( 10 ) Pub . No .: US 2021/0005284 A1 Nuzhdina et al . ( 43 ) Pub . Date : Jan. 7 , 2021

( 54 ) TECHNIQUES FOR NUCLEIC ACID DATA G16B 25/10 ( 2006.01 ) QUALITY CONTROL G16B 35/10 ( 2006.01 ) ( 52 ) U.S. CI . ( 71 ) Applicant: BostonGene Corporation , Waltham , CPC G16B 30/10 ( 2019.02 ) ; G16B 35/10 MA ( US ) ( 2019.02 ) ; G16B 25/10 ( 2019.02 ) ; G16B ( 72 ) Inventors : Ekaterina Nuzhdina , Moscow (RU ); 20/20 (2019.02 ) Alexander Bagaev , Moscow ( RU ); Maksim Chelushkin , Moscow (RU ) ; Yaroslav Lozinsky, Moscow ( RU ); ( 57 ) ABSTRACT Natalia Miheecheva , Moscow (RU ) ; Alexander Zaitsev , Leninskiy R - N (RU ) Described herein are various methods of collecting and processing of tumor and /or healthy tissue samples to extract ( 21 ) Appl . No .: 16 / 920,641 nucleic acid and perform nucleic acid sequencing . Also described herein are various methods of processing nucleic ( 22 ) Filed : Jul . 3 , 2020 acid sequencing data to remove bias from the nucleic acid sequencing data . Also described herein are various methods Related U.S. Application Data of evaluating the quality of nucleic acid sequence informa ( 60 ) Provisional application No. 62 / 991,570 , filed on Mar. tion . The identity and / or integrity of nucleic acid sequence 18 , 2020 , provisional application No. 62 /870,622 , data is evaluated prior to using the sequence information for filed on Jul. 3 , 2019 . subsequent analysis ( for example for diagnostic , prognostic , or clinical purposes ). The methods enable a subject, doctor, Publication Classification or user to characterize or classify various types of cancer ( 51 ) Int . Cl . precisely , and thereby determine a therapy or combination of G16B 30/10 ( 2006.01 ) therapies that may be effective to treat a cancer in a subject G16B 20/20 ( 2006.01 ) based on the precise characterization .

110 BEGIN

Obtain a biological sample ( e.g. , a tumor sampte ) from a subject 111 having, suspected of having , or at risk of having cancer

Extract nucleic acid from the biological sample 112

Perform one or more quality control assessments of extracted nucleic 113 acid

Prepare nucleic acid library with extracted nucleic acid -114 that satisfies nucleic acid quality control

Sequence DNA and / or RNA nucleic acid , -115 obtain RNA expression data

Process RNA expression data to obtain bias - corrected 116 expression data

Perform one or more quality control assessmients on gene -117 expression data

Analyze data 118

Identify a cancer treatment for the subject based on 119 the gene expression data

End Patent Application Publication Jan. 7 , 2021 Sheet 1 of 22 US 2021/0005284 A1

BEGIN 200100

Obtain a sample of a tumor from a subject having , suspected of 101 having , or at risk of having cancer

102 Quality control step 1 - Perform one or more quality control assessments on tumor sample

Quality control step 2 - Perform one or more quality control assessments on 103 DNA , RNA , and / or libraries processes

Quality control step 3 - Perform one or more quality control 104 assessments on nucleic acid sequence data

End

FIG . 1A Patent Application Publication Jan. 7 , 2021 Sheet 2 of 22 US 2021/0005284 A1

110 BEGIN

Obtain a biological sample ( e.g., a tumor sample ) from a subject 111 having , suspected of having, or at risk of having cancer

Extract nucleic acid from the biological sample 112

Perform one or more quality control assessments of extracted nucleic 113 acid

Prepare nucleic acid library with extracted nucleic acid -114 that satisfies nucleic acid quality control

Sequence DNA and / or RNA nucleic acid , 115 obtain RNA expression data

Process RNA expression data to obtain bias - corrected gene -116 expression data

Perform one or more quality control assessments on gene expression data ' 117

Analyze gene expression data -118

Identify a cancer treatment for the subject based on 119 the gene expression data

End FIG . 1B Patent Application Publication Jan. 7 , 2021 Sheet 3 of 22 US 2021/0005284 A1

enrichmentpolyA FIG.2A

rRNAdepletion Patent Application Publication Jan. 7 , 2021 Sheet 4 of 22 US 2021/0005284 A1

FIG.2B

WX08 ) uossa Patent Application Publication Jan. 7 , 2021 Sheet 5 of 22 US 2021/0005284 A1

IILE Samples

FIG . 3 Patent Application Publication Jan. 7 , 2021 Sheet 6 of 22 US 2021/0005284 A1

Histone length ( Hela ) Gene Median length , nt HIST1H4C 20,0 HISTIH1C 25,5 HIST1H2BK 71,0 HIST1H2AE 14,0 HIST1H3B 15,0 HIST3H2A 17,0

FIG . 4A Patent Application Publication Jan. 7 , 2021 Sheet 7 of 22 US 2021/0005284 A1

Polya Total

FIG . 4B Patent Application Publication Jan. 7 , 2021 Sheet 8 of 22 US 2021/0005284 A1

SIKIO

Polya Total

FIG . 4C Patent Application Publication Jan. 7 , 2021 Sheet 9 of 22 US 2021/0005284 A1

b

FIG . 5 Patent Application Publication Jan. 7. 2021 Sheet 10 of 22 US 2021/0005284 A1

BEGIN 200

Obtain a first sample of a first tumor from a -201 subject having , suspected of having , or at risk of having cancer

202 Extract RNA from the first sample of the first tumor

Enrich the extracted RNA for coding RNA to obtain 203 enriched RNA

Prepare a first library of CDNA fragments from the 204 enriched RNA for non - stranded RNA sequencing

Perform non - stranded RNA sequencing on 205 the first library of cDNA fragments from the enriched RNA

End FIG . 6A Patent Application Publication Jan. 7 , 2021 Sheet 11 of 22 US 2021/0005284 A1

BEGIN 210

Obtain RNA expression data for a subject having , suspected of 211 having , or at risk of having cancer

Align the RNA expression data to a reference and annotate the RNA * 212 expression data

Remove non - coding transcripts from the annotated RNA expression data to -213 obtain filtered RNA expression data

Normalize the filtered RNA expression data to obtain gene expression 214 data in transcripts per million ( TPM )

Identify at least one gene that introduces bias in the gene expression data 215

Remove at least one gene from the gene expression data to obtain 216 bias - corrected gene expression data

Identify a cancer treatment for the subject using the bias - corrected -217 gene expression data

End FIG . 6B Patent Application Publication Jan. 7 , 2021 Sheet 12 of 22 US 2021/0005284 A1

220 BEGIN

Enrich RNA for coding RNA in a sample of extracted RNA from a first 221 tumor sample from a subject having , suspected of having , or at risk of having cancer

Perform non - stranded RNA sequencing on a first library of cDNA fragments prepared from the enriched RNA to obtain RNA 222 expression data

Convert the RNA expression data to gene expression data 223

Identify at least one gene that introduces bias in the gene expression 224 data in transcripts per kilobase million ( TPM )

Remove at least one gene from the gene expression data to 225 obtain bias - corrected gene expression data

Identify a cancer treatment for the subject using the 226 bias - corrected gene expression data

End FIG . 6C Patent Application Publication Jan. 7. 2021 Sheet 13 of 22 US 2021/0005284 A1

300

308

301 309

302 . 310 303

311 304 .

312 305 313

306 314

307 315

FIG . 7 Patent Application Publication Jan. 7 , 2021 Sheet 14 of 22 US 2021/0005284 A1

800

Obtain nucleic acid data comprising : sequence data and asserted information indicating an asserted 801 source and / or an asserted integrity of the sequence data

Process the nucleic acid data 802a 802b 802 Process the sequence data to Process the sequence data to obtain determined information obtain determined information indicating a determined source indicating a determined of the sequence data integrity of the sequence data

803 Match ?

Yes

No Process the sequence data to determine whether the sequence data is indicative of one or more disease 804 features and / or determine a therapy for the subject

Perform remedial action ( s ) 805

FIG . 8 Patent Application Publication Jan. 7 , 2021 Sheet 15 of 22 US 2021/0005284 A1

500

Kiowaw 520 550

FIG9.

540 510 530 Patent Application Publication Jan. 7 , 2021 Sheet 16 of 22 US 2021/0005284 A1

640 660

600

620 670

610 FIG.10

SampleBokcal

089 680 *** 650 OD Patent Application Publication Jan. 7 , 2021 Sheet 17 of 22 US 2021/0005284 A1

301302303 Normal 60400701 8:00 C.06:02 83

WES 8,04:0103:03 0:0802,06:02 8:5301.550 FIG.11 FIG.12

RNA-Seq C.06:02,06:02 8370103 8633602 3703

103 Patent Application Publication Jan. 7. 2021 Sheet 18 of 22 US 2021/0005284 A1

FIG.13A Good

*** Patent Application Publication Jan. 7. 2021 Sheet 19 of 22 US 2021/0005284 A1

FIG.13B Patent Application Publication Jan. 7. 2021 Sheet 20 of 22 US 2021/0005284 A1

Total FIG.14A Polya

VAjodjejoi jo killigeqoid Patent Application Publication Jan. 7. 2021 Sheet 21 of 22 US 2021/0005284 A1

Total Polya FIG.14B

0.5 Vhod / e10l jo yliqeqoid Patent Application Publication Jan. 7 , 2021 Sheet 22 of 22 US 2021/0005284 A1

...ilili

***

369211 260240, FIG.15 US 2021/0005284 A1 Jan. 7 , 2021 1

TECHNIQUES FOR NUCLEIC ACID DATA or a determined integrity of the sequence data ; and deter QUALITY CONTROL mining whether the determined information matches the asserted information . CROSS REFERENCE TO RELATED [ 0005 ] Some embodiments provide at least one non - tran APPLICATIONS sitory computer - readable storage medium storing processor executable instructions that, when executed by at least one [ 0001 ] This application claims the benefit under 35 U.S.C. computer hardware processor, cause the at least one com $ 119 ( e ) of U.S. Provisional Application No. 62 / 870,622 , puter hardware processor to perform a method . The method filed Jul . 3 , 2019 , entitled “ Compositions and Methods for comprising: obtaining nucleic acid data comprising : Sample Preparation and Characterization of Cancer There sequence data indicating a nucleotide sequence for at least 5 from " and U.S. Provisional Application No. 62 / 991,570 , kilobases ( kb ) of DNA and / or RNA from a previously filed Mar. 18 , 2020 , entitled “ Nucleic Acid Data Quality obtained biological sample of a subject having, suspected of Control , ” the entire disclosure of each is hereby incorporated having, or at risk of having a disease ; and asserted infor by reference. mation indicating an asserted source and / or an asserted integrity of the sequence data ; and validating the nucleic FIELD acid data by : processing the sequence data to obtain deter mined information indicating a determined source and / or a [ 0002 ] Some aspects of the technology described herein determined integrity of the sequence data ; and determining relate to collecting and processing of tumor and / or healthy whether the determined information matches the asserted tissue samples to extract nucleic acid and perform nucleic information . acid sequencing. Some aspects of the technology described [ 0006 ] Some embodiments using at least one computer herein relate to processing nucleic acid sequencing data to hardware processor to perform : obtaining nucleic acid data remove bias from the nucleic acid sequencing data . Also comprising: sequence data indicating a nucleotide sequence described herein are various methods of evaluating the for at least 5 kilobases ( kb ) of DNA and / or RNA from a quality of nucleic acid sequence information obtained by previously obtained biological sample of a subject having, sequencing suspected of having , or at risk of having a disease ; and asserted information indicating an asserted source and /or an BACKGROUND asserted integrity of the sequence data; and validating the [ 0003 ] Correctly characterizing the type or types of cancer nucleic acid data by : processing the sequence data to obtain a patient or subject has and , potentially , selecting one or determined information indicating a determined source and / more effective therapies for the patient based on the char or a determined integrity of the sequence data ; and deter acterization can be crucial for the survival and overall mining whether the determined information matches the wellbeing of that patient. The manner in which biological asserted information . samples from a subject are processed to obtain sequence [ 0007 ] In some embodiments, the sequence data may data ( e.g. , RNA expression data ) to characterize the type or include raw DNA or RNA sequence data , DNA exome types of cancer, and the manner in which the data is sequence data ( e.g. , from whole exome sequencing ( WES ) , processed may have detrimental effects on the characteriza DNA genome sequence data ( e.g. , from whole genome tion of the cancer or cancers . For example, high throughput sequencing ( WGS ) ), RNA expression data, gene expression nucleic acid sequencing platforms ( e.g. , next generation data, bias - corrected gene expression data or any other suit sequencing platforms) can generate large amounts of DNA able type of sequence data comprising data obtained from a and RNA sequence data from patient samples . Advances in sequencing platform and / or comprising data derived from sample preparation , data processing, and evaluation of data obtained from a sequencing platform . [ 0008 ] In some embodiments , the method further com sequence information from different NGS platforms by prises processing the sequence data to determine whether the custom software for characterizing cancers , predicting prog sequence data is indicative of one or more disease features noses , identifying effective therapies, and otherwise aiding when it is determined that the asserted information matches in personalized care of patients with cancer are needed . the determined information . [ 0009 ] In some embodiments, the method further com SUMMARY prises determining that the determined information matches [ 0004 ] Some embodiments provide for a system compris the asserted information ; and processing the sequence data ing : at least one computer hardware processor ; and at least to determine whether it is indicative of one or more disease one non -transitory computer -readable storage medium stor features. ing processor executable instructions that, when executed by [ 0010 ] In some embodiments, the method further com the at least one computer hardware processor, cause the at prises generating an indication: that the determined infor least one computer hardware processor to perform a method, mation does not match the asserted information , to not the method comprising : obtaining nucleic acid data com process the sequencedata in a subsequent analysis, and /or to prising: sequence data indicating a nucleotide sequence for obtain additional sequence data and / or other information at least 5 kilobases ( kb ) of DNA and / or RNA from a about the biological sample and / or the subject, when it is previously obtained biological sample of a subject having , determined that the asserted information does not match the suspected of having, or at risk of having a disease ; and determined information . asserted information indicating an asserted source and / or an [ 0011 ] In some embodiments, the method further com asserted integrity of the sequence data ; and validating the prises: determining that the asserted information does not nucleic acid data by : processing the sequence data to obtain match the determined information ; and generating an indi determined information indicating a determined source and / cation : that the determined information does not match the US 2021/0005284 A1 Jan. 7 , 2021 2 asserted information , to not process the sequence data in a tion ; single nucleotide polymorphisms ( SNPs ) ; complexity ; subsequent analysis, and / or to obtain additional sequence and guanine ( G ) and cytosine ( C ) percentage ( % ) of the data and /or other information about the biological sample sequence data . and /or the subject. [ 0020 ] In some embodiments, the asserted information for [ 0012 ] In some embodiments, the asserted information the sequence data comprises MHC allele information for the indicates the asserted source of the sequence data , the subject . method further comprising processing the sequence data to [ 0021 ] In some embodiments, the method further com obtain determined information indicative of a determined prises determining one or more MHC allele sequences from source for the sequence data ; and determining whether the the sequence data and determining whether the one or more determined source matches the asserted source for the MHC alleles sequences match the asserted MHC allele sequence data . information for the subject. [ 0022 ] In some embodiments, determining the one or [ 0013 ] In some embodiments , the determined information more MHC allele sequences comprises determining MHC indicative of the determined source for the sequence data is allele sequences for six MHC loci from the sequence data . indicative of an MHC genotype of the subject; whether the [ 0023 ] In some embodiments , wherein the sequence data nucleic acid data is RNA data or DNA data ; a tissue type of indicates the nucleotide sequence for RNA , the asserted the biological sample; a tumor type of the biological sample ; information indicates whether the RNA is polyA enriched . a sequencing platform used to generate the sequence data; [ 0024 ] In some embodiments, determining, using the SNP concordance , and / or a whether an RNA sample is sequence data , a therapy for the subject when it is deter polyA enriched . mined that the asserted information matches the determined [ 0014 ] In some embodiments, the determined information information . indicative of the determined source for the sequence data is [ 0025 ] In some embodiments, determining the therapy indicative of at least two of an MHC genotype of the subject; comprises : determining a plurality of gene group expression whether the nucleic acid data is RNA data or DNA data ; a levels , the plurality of gene group expression levels com tissue type of the biological sample ; a tumor type of the prising a gene group expression level for each gene group in biological sample; a sequencing platform used to generate a set of gene groups , wherein the set of gene groups the sequence data ; SNP concordance , and a whether an RNA comprises at least one gene group associated with cancer sample is polyA enriched . malignancy, and at least one gene group associated with [ 0015 ] In some embodiments , the determined information cancer microenvironment; and identifying the therapy using indicative of the determined source for the sequence data is the determined gene group expression levels . indicative of at least three of an MHC genotype of the [ 0026 ] In some embodiments , the method further com subject; whether the nucleic acid data is RNA data or DNA prises administering the therapy to the subject. data ; a tissue type of the biological sample ; a tumor type of [ 0027 ] In some embodiments , wherein it is determined the biological sample ; a sequencing platform used to gen that the determined information matches the asserted infor erate the sequence data ; SNP concordance , and a whether an mation , the sequence data is processed to determine a RNA sample is polyA enriched . therapy for the subject, and the therapy is administered to the [ 0016 ] In some embodiments , the asserted information subject indicates the asserted integrity of the sequence data , the [ 0028 ] In some embodiments , wherein the disease is can method further comprising : processing the sequence data to cer , and the therapy is a cancer treatment. In some embodi obtain determined information indicative of a determined ments, the subject is . integrity of the sequence data; and determining whether the [ 0029 ] In some embodiments, processing the sequence determined integrity matches the asserted integrity for the data to obtain the determined source comprises determining sequence data . one or more single nucleotide polymorphisms ( SNPs ) in the sequence data , and determining whether the one or more [ 0017 ] In some embodiments, the determined information SNPs in the sequence data match one or more SNPs in a indicative of the determined integrity is indicative of total reference sequence . sequence coverage ; exon coverage ; chromosomal coverage ; [ 0030 ] In some embodiments, the reference sequence is a a ratio of nucleic acids encoding two or more subunits of a sequence of a nucleic acid in a second biological sample of multimeric ; species contamination ; single nucleotide the subject. polymorphisms ( SNPs ) ; complexity ; and / or guanine ( G ) and [ 0031 ] In some embodiments, processing the sequence cytosine ( C ) percentage ( % ) of the sequence data . data to obtain a determined integrity comprises : determining [ 0018 ] In some embodiments, the determined information a first level of a first nucleic acid encoding a first subunit of indicative of the determined integrity is indicative of at least a multimeric protein , determining a second level of a second two of total sequence coverage ; exon coverage; chromo nucleic acid encoding a second subunit of a multimeric somal coverage ; a ratio of nucleic acids encoding two or protein , and determining whether a ratio between the first more subunits of a multimeric protein ; species contamina level and the second level matches an expected ratio . In tion ; single nucleotide polymorphisms ( SNPs ) ; complexity ; some embodiments, the multimeric protein is a dimer. In and guanine ( G ) and cytosine ( C ) percentage ( % ) of the some embodiments, the first subunit and the second subunits sequence data . are first and second CD3 subunits , first and second CD8 [ 0019 ] In some embodiments , the determined information subunits , or first and second CD79 subunits . indicative of the determined integrity is indicative of at least [ 0032 ] Some embodiments provide for a system for iden three of total sequence coverage ; exon coverage ; chromo tifying a cancer treatment for a subject having, suspected somal coverage ; a ratio of nucleic acids encoding two or having, or at risk of having cancer , the system comprising: more subunits of a multimeric protein ; species contamina at least one sequencing platform configured to generate gene US 2021/0005284 A1 Jan. 7. 2021 3 expression data from enriched RNA obtained from a first bias in the gene expression data ; and identifying a cancer biological sample previously obtained from the subject, treatment for the subject using the bias - corrected gene wherein the enriched RNA was obtained by : ( i ) extracting expression data . RNA from the first biological sample of the first tumor to [ 0035 ] Some embodiments provide for a method compris obtain extracted RNA ; and ( ii ) enriching the extracted RNA ing : obtaining a first biological sample of a first tumor, the for coding RNA to obtain enriched RNA , wherein the RNA first biological sample previously obtained from a subject expression data comprises at least 5 kilobases (kb ); at least having, suspected of having or at risk of having cancer ; one computer hardware processor ; and at least one non extracting RNA from the first biological sample of the first transitory computer - readable storage medium storing pro tumor to obtain extracted RNA ; enriching the extracted RNA for coding RNA to obtain enriched RNA ; sequencing, cessor - executable instructions that, when executed by the at using at least one sequencing platform , the enriched RNA to least one computer hardware processor, cause the at least obtain RNA expression data comprising at least 5 kilobases one computer hardware processor to perform : obtaining the ( kb ); using at least one computer hardware processor to RNA expression data using the at least one sequencing perform : obtaining the RNA expression data using the at platform ; converting the RNA expression data to gene least one sequencing platform ; converting the RNA expres expression data ; determining bias - corrected gene expression sion data to gene expression data ; determining bias- cor data from the gene expression data at least in part by rected gene expression data from the gene expression data at removing , from the gene expression data, expression data least in part by removing , from the gene expression data , for at least one gene that introduces bias in the gene expression data for at least one gene that introduces bias in expression data; and identifying a cancer treatment for the the gene expression data ; and identifying a cancer treatment subject using the bias - corrected gene expression data . for the subject using the bias - corrected gene expression data . [ 0033 ] Some embodiments provide for a system for iden [ 0036 ] In some embodiments, the method further com tifying a cancer treatment for a subject having, suspected prises administering the identified cancer treatment to the having, or at risk of having cancer , the system comprising : subject. at least one computer hardware processor ; and at least one [ 0037 ] In some embodiments, enriching the RNA for non - transitory computer - readable storage medium storing coding RNA comprises performing polyA enrichment. [ 0038 ] In some embodiments, the at least one gene that processor - executable instructions that, when executed by the introduces bias in the gene expression data comprises : a at least one computer hardware processor , cause the at least gene having an average transcript length that is higher or one computer hardware processor to perform : obtaining lower than an average length of transcripts in the gene RNA expression data from at least one sequencing platform , expression data ; a gene having at least a threshold variation the RNA expression data comprising at least 5 kilobases ( 5 in average transcript expression level based on transcript kb ), wherein the RNA expression data was obtained , from a expression levels in reference samples; and / or a gene that first biological sample of a first tumor previously obtained has a polyA tail that is at least a threshold amount smaller in from the subject, at least in part by : ( i ) extracting RNA from length compared to an average length of polyA tails of the first biological sample of the first tumor to obtain from : the first biological sample from which the RNA extracted RNA ; and ( ii ) enriching the extracted RNA for expression data was obtained and / or a reference sample. coding RNA to obtain enriched RNA ; converting the RNA [ 0039 ] In some embodiments, the at least one gene that expression data to gene expression data ; determining bias introduces bias in the gene expression data belongs to a corrected gene expression data from the gene expression family of genes selected from the group consisting of: data at least in part by removing, from the gene expression - encoding genes, mitochondrial genes , interleukin data, expression data for at least one gene that introduces encoding genes, collagen - encoding genes, B - cell receptor bias in the gene expression data; and identifying a cancer encoding genes , and T cell receptor - encoding genes. treatment for the subject using the bias - corrected gene [ 0040 ] In some embodiments, the at least one gene com expression data . The system may further comprise the at prises at least one histone - encoding gene selected from the least one sequencing platform in some embodiments. group consisting of: HIST1H1A , HIST1H1B , HISTIHIC , [ 0034 ] Some embodiments provide for at least one non HISTIHID , HISTIHIE , HISTIHIT , HIST1H2AA , transitory computer - readable storage medium storing pro HIST1H2AB , HIST1H2AC , HIST1H2AD , HIST1H2AE , cessor - executable instructions that, when executed by at HIST1H2AG , HIST1H2AH , HIST1H2AI , HIST1H2AJ , least one computer hardware processor, cause the at least HIST1H2AK , HIST1H2AL , HIST1H2AM , HIST1H2BA , one computer hardware processor to perform : obtaining HIST1H2BB , HIST1H2BC , HIST1H2BD , HIST1H2BE , RNA expression data from at least one sequencing platform , HIST1H2BF , HIST1H2BG , HIST1H2BH , HIST1H2BI , the RNA expression data comprising at least 5 kilobases HIST1H2BJ , HIST1H2BK , HIST1H2BL , HIST1H2BM , ( 5kb ), wherein the RNA expression data was obtained , from HIST1H2BN , HIST1H2BO , HIST1H3A , HIST1H3B , a first biological sample of a first tumor previously obtained HIST1H3C , HIST1H3D , HISTI??? ,, HIST1H3F, from a subject having, suspected of having or at risk of HIST1H3G , HIST1??? ,, HIST1H31 , HIST1??? ,, having cancer, at least in part by : ( i ) extracting RNA from HIST1H4A , HIST1H4B , HIST1H4C , HIST1H4D , the first biological sample of the first tumor to obtain HIST1H4E , HIST1H4F , HIST1H4G , HIST1H4H , extracted RNA ; and ( ii ) enriching the extracted RNA for HIST1H41 , HIST1H4J , HIST1H4K , HIST1H4L , coding RNA to obtain enriched RNA ; converting the RNA HIST2H2AA3, HIST2H2AA4 , HIST2H2AB , expression data to gene expression data ; determining bias HIST2H2AC , HIST2H2BE , HIST2H2BF , HIST2H3A , corrected gene expression data from the gene expression HIST2H3C , HIST2H3D , HIST2H3PS2 , HIST2H4A , data at least in part by removing, from the gene expression HIST2H4B , HIST3H2A , HIST3H2BB , HIST3H3 , and data, expression data for at least one gene that introduces HIST4H4 . US 2021/0005284 A1 Jan. 7 , 2021 4

[ 0041 ] In some embodiments, the at least one gene com [ 0048 ] In some embodiments, identifying the cancer treat prises at least one mitochondrial gene selected from the ment for the subject using the bias - corrected gene expres group consisting of: MT- ATP6 , MT- ATP8, MT -CO1 , MT sion data comprises : determining, using the bias - corrected CO2 , MT - CO3 , MT -CYB , MT - ND1, MT -ND2 , MT -ND3 , gene expression data , a plurality of gene group expression MT- ND4 , MT- ND4L , MT- ND5 , MT-ND6 , MT- RNR1, MT levels , the plurality of gene group expression levels com RNR2, MT - TA , MT- TC , MT- TD , MT - TE , MT - TF , MT - TG , prising a gene group expression level for each gene group in MT- TH , MT- TI, MT- TK , MT- TL1, MT- TL2, MT- TM , MT a set of gene groups, wherein the set of gene groups TN , MT - TP, MT - TQ , MT - TR , MT - TS1 , MT - TS2 , MT - TT, comprises at least one gene group associated with cancer MT - TV , MT- TW , MT- TY, MTRNR2L1, MTRNR2L10 , malignancy, and at least one gene group associated with MTRNR2L11 , MTRNR2L12 , MTRNR2L13 , MTRNR2L3, cancer microenvironment; and identifying the cancer treat MTRNR2L4, MTRNR2L5, MTRNR2L6 , MTRNR2L7, and ment using the determined gene group expression levels . MTRNR2L8 . [ 0049 ] In some embodiments, the cancer treatment is [ 0042 ] In some embodiments, determining the bias -cor selected from the group consisting of a radiation therapy, a rected gene expression data further comprises: after remov surgical therapy, a chemotherapy, and an immunotherapy. [ 0050 ] In some embodiments, the method further com ing the expression data for the at least one gene that prises obtaining a second biological sample of a second introduces bias in the gene expression data, renormalizing tumor, the second biological sample previously obtained the gene expression data . from the subject. [ 0043 ] In some embodiments, converting the RNA expres [ 0051 ] In some embodiments, the method further com sion data to gene expression data comprises: removing prises combining the first biological sample and the second non -coding transcripts from the RNA expression data to biological sample to form a combined tumor sample, and obtain filtered RNA expression data ; and after removing the extracting the RNA comprises extracting the RNA from the non - coding transcripts , normalizing the filtered RNA expres combined tumor sample . sion data to obtain gene expression data in transcripts per [ 0052 ] In some embodiments, the method further com million ( TPM ) and / or any other suitable format . prises extracting RNA from the second biological sample ; [ 0044 ] In some embodiments, removing the non - coding and combining the RNA extracted from the second biologi transcripts from the RNA expression data comprises remov cal sample with the RNA extracted from the first biological ing non - coding transcripts that belong to groups selected sample to form combined extracted RNA , and enriching the from the list consisting of: pseudogenes , polymorphic RNA for coding RNA comprises enriching the combined pseudogenes, processed pseudogenes, transcribed processed extracted RNA for coding RNA . pseudogenes, unitary pseudogenes, unprocessed pseudo [ 0053 ] In some embodiments, the extracted RNA com genes , transcribed unitary pseudogenes, constant chain prises at least 1 ug of RNA upon RNA extraction . immunoglobulin ( IG C ) pseudogenes, joining chain immu [ 0054 ] In some embodiments , the extracted RNA is at noglobulin ( IG J ) pseudogenes, variable chain immuno least 1000-6000 ng in total mass , and has a purity corre globulin ( IG V ) pseudogenes, transcribed unprocessed sponding to a ratio of absorbance at 260 nm to absorbance pseudogenes, translated unprocessed pseudogenes , joining at 280 nm of at least 2.0 . chain T cell receptor ( TR J ) pseudogenes, variable chain T [ 0055 ] In some embodiments , the method further com cell receptor ( TR V ) pseudogenes, small nuclear RNAs prises performing quality control assessment on the RNA ( snRNA ), small nucleolar RNAs ( snoRNA ), microRNAs expression data at least in part by : obtaining asserted infor ( miRNA ) , ribozymes, ribosomal RNA ( rRNA ) , mitochon mation indicating an asserted source and / or an asserted drial tRNAs (Mt tRNA ), mitochondrial rRNAs (Mt rRNA ), integrity of the RNA expression data; processing the RNA small Cajal body - specific RNAs ( scaRNA ), retained , expression data to obtain determined information indicating sense intronic RNA , sense overlapping RNA , nonsense a determined source and / or a determined integrity of the mediated decay RNA , non - stop decay RNA , antisense RNA , RNA expression data ; and determining whether the deter long intervening noncoding RNAs ( lincRNA ), macro long mined information matches the asserted information . non - coding RNA (macro lncRNA ), processed transcripts , [ 0056 ] In some embodiments , processing the RNA expres 3prime overlapping non - coding RNA ( 3prime overlapping sion data comprises processing the RNA expression RNA to ncrna ), small RNAs ( SRNA ), miscellaneous RNA ( misc determine : a tissue type of the first biological sample ; a RNA ), vault RNA ( vaultRNA ), and TEC RNA . tumor type of the first biological sample ; and / or guanine ( G ) [ 0045 ] In some embodiments, information ( e.g. , sequence and /or cytosine ( C ) percentage ( % ) . information ) for one or more transcripts for one of more of these types of transcripts can be obtained in a nucleic acid BRIEF DESCRIPTION OF THE DRAWINGS database ( e.g. , a Gencode database , for example Gencode [ 0057 ] The following drawings form part of the present V23 , Genbank database , EMBL database , or other data specification and are included to further demonstrate certain base ) . aspects of the present disclosure, which can be better under [ 0046 ] In some embodiments , the method further com stood by reference to one or more of these drawings in prises, prior to performing the removal of the non - coding combination with the detailed description of specific transcripts, aligning the RNA expression data to a reference ; embodiments presented herein . The drawings are not nec and annotating the RNA expression data . essarily drawn to scale . [ 0047 ] In some embodiments, the RNA expression data [ 0058 ] FIG . 1A and FIG . 1B provide exemplary flow comprises at least 25 million paired - end reads . In some charts that illustrate processes of sample preparation and embodiments, the RNA expression data comprises at least quality control . FIG . 1A provides an example of a process 50 million paired -end reads, with an average read length of pipeline that includes one or more quality control assess at least 100 bp . ments during biopsy sample collection , DNA / RNA extrac US 2021/0005284 A1 Jan. 7 , 2021 5 tion and library construction, and /or nucleic acid bioinfor [ 0069 ] FIG . 8 is an exemplary flowchart that illustrates a matic analysis . FIG . 1B provides an example of a process for process 800 showing computerized processes for processing obtaining a biopsy sample of a subject, extracting nucleic and validating sequence data and related information . acid from the sample , sequencing the nucleic acid , and [ 0070 ] FIG . 9 is a block diagram of an illustrative com processing the nucleic acid sequence to identify one or more puter system 500 that may be used to implement one or more cancer therapies appropriate for the subject. embodiments of a process pipeline for preparing , assessing , [ 0059 ] FIGS . 2A and 2B provide graphical representations and / or analyzing sequence data . of the levels and distribution of RNA transcripts depending [ 0071 ] FIG . 10 is a block diagram of an illustrative envi on the type of RNA enrichment methods used and whether ronment 600 in which one or more embodiments of the stranded or non - stranded RNA was used for sequencing . technology described in this application may be imple FIG . 2A provides a graphical representation of distribution mented . of RNA after RNA enrichment by either depletion of ribo [ 0072 ] FIG . 11 shows the results of MHC allele analysis somal RNA ( r -RNA ) or by poly A enrichment. FIG . 2B for sequence information obtained from three nucleic acids provides a graphical representation of levels of RNA mea (RNA -Seq , WES Tumor, and WES Normal) for two subjects sured after RNA sequencing of either stranded or non ( 103 and 105 ) . stranded RNA for IL24 , ICAM4 , and GAPDH . [ 0073 ] FIG . 12 shows an example of a bar graph repre [ 0060 ] FIG . 3 is a graphical representation of the distri senting the probability of sequence information being from bution of different RNA transcripts for different types of a particular type of tumor ( e.g. , BRCA related to breast RNA as shown in the legend . Each column represents a cancer ). unique sample. All samples were prepared from the same [ 0074 ] FIGS . 13A - 13B show graphs representing an tissue type, using the same RNA enrichment method and the example of the relationship between protein subunit expres same sequencing service . The bottom panel shows the data sion levels . in the top panel in Transcripts per Kilobase Million ( TPM ) . [ 0075 ] FIGS . 14A - 14B show examples of bar graphs [ 0061 ] FIG . 4A shows the distribution of poly - A tails of representing the probability that sequence information was RNA transcripts from HeLa cell samples, and several obtained from samples that contained only polyadenylated examples of poly - A tails for histone family genes . RNA or from samples that contained total or all RNA ( total [ 0062 ] FIG . 4B shows a comparison of expression of RNA ). mitochondrial RNA for samples that are either poly - A [ 0076 ] FIG . 15 shows an example of a principal compo enriched or enriched by rRNA depletion ( denoted by total nent analysis and illustrates the analysis of three batches of RNA ). gene expression data including tumor and normal samples. [ 0063 ] FIG . 4C shows a comparison of expression of Histone coding RNA for samples that are either poly - A DETAILED DESCRIPTION enriched or enriched by rRNA depletion (denoted by total [ 0077 ] Recent advances in personalized genomic sequenc RNA ). ing and cancer genomic sequencing technologies have made [ 0064 ] FIG . 5 shows a principal component analysis it possible to obtain patient - specific information about can ( PCA ) of RNA expression of cell samples containing dif cer cells ( e.g. , tumor cells ) and cancer microenvironments ferent percentages of HeLa cells , either with or without from one or more biological samples obtained from indi polyA enrichment, and with or without data filtration . Data vidual patients. The inventors have appreciated that this filtration included removal of non - coding RNA transcripts , information may be used to characterize the type ( s ) of histone - coding transcripts , and mitochondrial transcripts. cancer a patient has and , potentially, select one or more The PCAO component describes major differences effective therapies for the patient. This information may also between poly - A and total -RNA sequencing. The PCA1 be used to determine how a patient is responding over time component describes different cell line ratios . Samples were to a treatment and , if necessary, to select a new therapy or prepared as a mixture of two different cell lines in 5 different therapies for the patient as necessary. This information may ratios . also be used to determine whether a patient should be [ 0065 ] FIG . 6A is an exemplary flowchart that illustrates included or excluded from participating in a clinical trial . a process 200 for obtaining enriched RNA sequence data [ 0078 ] The inventors have recognized that the workflow from a tumor of a subject having, suspected of having, or at used to obtain sequence data for a patient strongly influences risk of having cancer. the inferences that can be drawn about the patient's cancer . [ 0066 ] FIG . 6B is an exemplary flowchart that illustrates Such inferences include , but are not limited to , determining a process 210 for obtaining bias - corrected gene expression whether the patient will respond to a particular therapy or data from RNA expression data to identify a cancer treat therapies, whether the patient will have an adverse reaction ment for a subject having, suspected of having, or at risk of to a particular therapy or therapies, whether the patient is a having cancer . candidate for enrollment in a clinical trial, whether the [ 0067 ] FIG . 6C is an exemplary flowchart that illustrates patient has one or more particular biomarkers ( e.g. , bio a process 220 for processing RNA obtained from a tumor markers indicative of potential response to a therapy , bio sample to identify a cancer treatment for a subject having, markers indicative of survival, etc. ), whether the patient's suspected of having, or at risk of having cancer . disease has progressed ( e.g. , from an earlier stage cancer to [ 0068 ] FIG . 7 is flowchart of an illustrative process pipe a later stage cancer, relapsed from remission , etc. ) , whether line 300 comprising bioinformatic quality control processes a different therapy or therapies should be selected for the for assessing nucleic acid sequence data obtained from a patient, and / or any other suitable prognostic , diagnostic, tumor sample and using the nucleic acid sequence data to and / or clinical inferences . identify a cancer treatment for a subject having , suspected of [ 0079 ] When the workflow used to obtain sequence data having, or at risk of having cancer . contains errors , sub - optimal processing, sources of bias in US 2021/0005284 A1 Jan. 7 , 2021 6 the data, and the like , it is often not possible to make of these three categories may be utilized in a workflow to inferences about the subject's cancer with the desired or obtain sequence data for a patient, though it should be necessary confidence , or even make any such inferences at appreciated that this is not a limitation of the techniques all . Even worse , errors in the workflow for producing described herein , and that, in some embodiments, any one or sequence data may result in incorrect inferences about the more of the techniques ( but not necessarily all of them ) may patient, potentially leading to incorrect treatment or missed be used in a workflow . opportunities for better treatment . Moreover, workflow [ 0084 ] As one example , in some embodiments, novel errors lead to wasted resources in the laboratory ( e.g. , having sample preparation techniques and post - processing tech to reprocess samples) and wasted computing resources ( e.g. , niques include obtaining sequencing data and removing performing expensive computational processing on mega sources of bias from the sequencing data by : ( 1 ) obtaining a bytes and gigabytes of sequence data , taking up processor first biological sample of a first tumor, the first biological and networking resources, only to discard the results at a sample previously obtained from a subject having, suspected later time and / or have to repeat the processing ) . of having or at risk of having cancer ; ( 2 ) extracting RNA [ 0080 ] A conventional workflow used to obtain sequence from the first biological sample of the first tumor to obtain data for a patient includes multiple steps including : obtain extracted RNA ; ( 3 ) enriching the extracted RNA for coding ing a biological sample from the patient ( e.g. , by performing RNA to obtain enriched RNA ; ( 4 ) sequencing, using at least a biopsy , obtaining a blood sample , a salivary sample or any one sequencing platform , the enriched RNA to obtain RNA other suitable biological sample from the patient ), preparing expression data comprising at least 5 kilobases ( kb ); and ( 5 ) the biological sample for sequencing using a sequencing using at least one computer hardware processor to perform : platform ( e.g., a next generation sequencing ( NGS ) plat ( a ) obtaining the RNA expression data using the at least one form ), and obtaining raw data output by the sequencing sequencing platform ; ( b ) converting the RNA expression platform . Various conventional bio - informatics processing data to gene expression data; ( C ) determining bias -corrected pipelines and other algorithms may then use the raw data gene expression data from the gene expression data at least output by the sequencing platform in an attempt to make one in part by removing , from the gene expression data , expres or more of the above - described inferences. sion data for at least one gene that introduces bias in the gene [ 0081 ] However, such conventional workflows for obtain expression data ; and ( d ) identifying a cancer treatment for ing sequencing data are prone to errors at all stages . For the subject using the bias - corrected gene expression data . example, errors may be made in a laboratory when handling [ 0085 ] Removing bias from the gene expression data in samples for multiple patients. Indeed , it is not uncommon this way provides an improvement to sequencing technology for a laboratory to receive a biological sample asserted to be for numerous reasons . First , it removes artefacts and sources from one patient, when that sample is from another patient. of bias from sequencing data, resulting in fewer errors in any As another example , a biological sample may not be pro downstream processing and a higher fidelity output. Second , cessed properly by the laboratory and may not have the the inventors have recognized that removing sources of bias concentration and / or quality of nucleic acid needed for in this way allows for more accurately and faithfully repre subsequent analysis. As yet another example, errors may be senting a patient's molecular functional characteristics ( e.g. , introduced by the sequencing platform itself and / or subse via molecular functional expression signatures described quent post processing steps ( e.g. , alignment and variant herein ). The inventors have recognized that the bias- cor calling ). As yet another example, raw sequencing data rected gene expression data may be used to identify more produced by a sequencing platform may contain artefacts effective therapies for a patient, improve ability to determine and undesired sequences and / or transcripts. Other examples whether one or more cancer therapies will be effective if of various errors are described herein . administered to the patient, improve the ability to identify [ 0082 ] In some embodiments, sequence data or sequenc clinical trials in which the subject may participate, and / or ing data may include raw DNA or RNA sequence data , DNA improvements to numerous other prognostic, diagnostic, and exome sequence data ( e.g. , from whole exome sequencing clinical applications. ( WES ) , DNA genome sequence data ( e.g. , from whole [ 0086 ] As another example, in some embodiments, novel genome sequencing ( WGS ) ) , RNA expression data, gene quality control techniques include using at least one com expression data , bias - corrected gene expression data or any puter hardware processor to perform : ( a ) obtaining nucleic other suitable type of sequence data comprising data acid data comprising: ( i ) sequence data indicating a nucleo obtained from a sequencing platform and / or comprising data tide sequence for at least 5 kilobases (kb ) of DNA and / or derived from data obtained from a sequencing platform . RNA from a previously obtained biological sample of a [ 0083 ] To address shortcomings of conventional work subject having, suspected of having , or at risk of having a flows for obtaining sequencing data for patients , the inven disease ; and ( ii ) asserted information indicating an asserted tors have developed techniques that address various sources source and / or an asserted integrity of the sequence data ; and of error that may be present in sequencing data . These ( b ) validating the nucleic acid data by : ( i ) processing the techniques developed by the inventors include: ( 1 ) novel sequence data to obtain determined information indicating a sample preparation techniques to prepare biological samples determined source and / or a determined integrity of the for sequencing using one or multiple sequencing platforms; sequence data ; and ( ii ) determining whether the determined ( 2 ) novel techniques for post processing raw data output by information matches the asserted information . Examples of the sequencing platform ( s ) to filter out irrelevant data and various such validation techniques are described herein , and sources of bias ( e.g. , transcripts for non - coding regions and they are important examples of quality control techniques expression data associated with genes that introduce bias in developed by the inventors and described herein . the sequence data ); and ( 3 ) novel quality control techniques [ 0087 ] Employing such quality control techniques also that facilitate the detection and remediation of errors in the provides an improvement to sequencing technology and sequence data . In some embodiments , techniques from each computer technology. First , sequencing data that do not pass US 2021/0005284 Al Jan. 7 , 2021 7 one or more quality control checks are not used for some or bias ) subsequent analysis of the gene expression data . In all of downstream processing reducing or eliminating errors some embodiments , gene expression data is normalized in downstream applications ( e.g. , identifying biomarkers , ( e.g. , after removal of the data for the one or more interfering tumor microenvironment types, possible therapies for a genes ). In some embodiments , one or more sequence quality patient, etc. ) . Often such downstream processing requires control assessments are performed on DNA and / or RNA performing expensive ( frequently cloud - based ) computa sequence data ( e.g. , on processed , for example normalized , tional processing of large data sets ( e.g. , sequencing data gene expression data ) for bioinformatic quality control act may contain tens of millions of reads , which have to be 104. In some embodiments , one or more bioinformatic aligned , annotated and processed in other numerous ways ). quality control assessments are performed to determine Using quality control to prevent computationally expensive whether sequence data is from an expected source ( e.g. , processes from executing will reduce or eliminate wasteful patient, tissue , tumor, etc.) and /or whether it has sufficient use of computing resources , saving processing power , integrity for further analysis. In some embodiments, memory , and networking resources (which is an improve sequence data that satisfies bioinformatic quality control act ment to computing technology in addition to being an 104 is further processed, for example, to determine a diag improvement to sequencing technology ). Identifying errors nosis , prognosis, and / or therapy for a subject, to evaluate will also reduce waste of resources at a laboratory that and / or monitor a subject, and / or for one or more clinical processes multiple samples, by freeing up equipment for applications ( e.g. , to evaluate a therapy ). processing biological samples that have passed initial qual [ 0090 ] In some embodiments, the sequence data may ity control checks . In addition , using sequence data for include raw DNA or RNA sequence data , DNA exome downstream processing that has passed various quality con sequence data ( e.g. , from whole exome sequencing ( WES ) , trol checks may be used to identify more effective therapies DNA genome sequence data ( e.g. , from whole genome for a patient, improve ability to determine whether one or sequencing ( WGS ) ) , RNA expression data , gene expression more cancer therapies will be effective if administered to the data, bias - corrected gene expression data or any other suit patient , improve the ability to identify clinical trials in which able type of sequence data comprising data obtained from a the subject may participate, and / or improvements to numer sequencing platform and / or comprising data derived from ous other prognostic , diagnostic, and clinical applications. data obtained from a sequencing platform including, but not [ 0088 ] FIGS . 1A and 1B illustrate examples of process limited to , examples of such data described herein . pipelines for sample preparation and quality control as [ 0091 ] FIG . 1B illustrates a non - limiting process pipeline described herein . The process pipeline in FIG . 1 illustrate 110 for preparing nucleic acid from a biological sample embodiments of methods and systems provided in the pres ( e.g. , a tumor biopsy ) and obtaining and processing nucleic ent disclosure and are not to be construed in any way as acid sequence data for subsequent analysis ( e.g. , for diag limiting their scope . The present disclosure provides that a nostic , prognostic , therapeutic, and / or other clinical appli process pipeline does not need to include all of the process cations ) . Process pipeline 110 is performed by obtaining a steps or the order of process steps illustrated in FIG . 1. One biological sample ( e.g. , a tumor sample ) from a subject or more processes can be omitted , repeated , or performed in having, suspected of having, or at risk of having cancer at act a different order depending on the application . 111. Nucleic acid ( e.g. , DNA and / or RNA ) is obtained ( e.g. , [ 0089 ] FIG . 1A illustrates a non -limiting process pipeline extracted ) from the sample at act 112. One or more quality 100 that includes one or more quality control assessments . control assessments of the nucleic acid is performed in act A biological sample ( e.g. , a tumor biopsy ) is obtained for a 113. One or more nucleic acid libraries is prepared in act subject ( e.g. , a subject having, suspected of having, or at risk 114 , for example using nucleic acid that satisfies at least one of having cancer ) in act 101. In some embodiments , the quality control assessment of act 113. The nucleic acid sample is obtained from a physician , hospital , clinic , or other libraries are sequenced using at least one sequencing plat healthcare provider. One or more sample quality control form in sequencing act 115 ( e.g. , to obtain RNA expression assessments at quality control act 102 can be performed on data for RNA ). In some embodiments , RNA expression data the biological sample. In some embodiments , a quality is converted to gene expression data in act 116 , and the gene control assessment on the biological sample ( e.g. , biopsy expression data is optionally bias - corrected , at least in part, material) comprises determining whether the sample is in an by removing expression data for at least one gene that appropriate form ( e.g. , fresh frozen or FFPE ) and / or is introduces bias in the gene expression data . One or more accompanied by sufficient information to identify the nature bioinformatic quality control assessments are performed on and source of the sample . Subsequently, nucleic acid ( e.g. , the DNA sequence data or RNA sequence data from act 115 DNA and / or RNA ) can be extracted from a biological and /or RNA sequence data ( e.g. , the bias - corrected gene sample that satisfies sample quality control act 102. One or expression data from act 116 ) in bioinformatics quality more nucleic acid quality control assessments at act 103 then control act 117. In some embodiments , nucleic acid data can be performed , for example to evaluate one or more ( e.g. , that satisfies at least one bioinformatic quality control physical attributes of the extracted nucleic acid , of a nucleic assessments of act 117 ) , is further processed in act 118 ( e.g. , acid library prepared from the extracted nucleic acid , and / or to determine one or more indicia of disease from the gene of pooled nucleic acids or libraries. Subsequently, nucleic expression data ), to perform a diagnostic , prognostic , thera acid ( e.g. , DNA and / or RNA ) that satisfies nucleic acid peutic, and / or other clinical assessment of the subject ( e.g. , quality control act 103 can be processed ( e.g. , to enrich for to identify a treatment, for example a cancer treatment, for polyA RNA ) and / or sequenced to obtain raw DNA and / or the subject) in act 119. In some embodiments, a treatment RNA sequence data ( e.g. , RNA expression data ) . In some ( e.g. , a cancer treatment) is administered to the subject . embodiments, RNA expression data can be processed to [ 0092 ] In some embodiments , act 111 comprises obtaining obtain gene expression data and optionally to remove data bulk biopsy tissues of a subject or a patient. In some for one or more types of genes that could interfere with ( e.g. , embodiments, act 111 comprises obtaining a blood sample US 2021/0005284 A1 Jan. 7 , 2021 8 of a subject or a patient. In some embodiments , act 111 example as described in Nicolas L Bray, Harold Pimentel, comprises obtaining a single cell suspension . In some Pall Melsted and Lior Pachter, Near - optimal probabilistic embodiments , act 111 comprises obtaining any types of RNA - seq quantification , Nature Biotechnology 34 , 525-527 sample that are suitable for preparing nucleic acids for ( 2016 ) , doi: 10.1038 /nbt.3519 ), and / or Gencode ( e.g. , Gen subsequent sequencing analysis. In some embodiments, act code V23 ) is used for sequence alignment, and / or annota 111 comprises obtaining more than one type of samples. tion . In some embodiments , act 116 comprises gene aggre [ 0093 ] In some embodiments, when the bulk biopsy tis gation . In some embodiments, act 116 comprises removing sues are obtained , the tissues are processed ( e.g. , homog expression data for one or more non - coding transcripts from enized in the presence of TriZol) to extract nucleic acids the gene expression data . In some embodiments, act 116 such as DNA or RNA at act 112. In some embodiments , comprises removing expression data for one or more genes when a single cell suspension is obtained , the suspension is that can bias the gene expression data . In some embodi processed to extract nucleic acids such as DNA or RNA at ments, act 116 comprises removing expression data for act 112. In some embodiments , nucleic acids can be histone encoding genes and / or mitochondrial- encoding extracted that are suitable for germline whole exome genes . In some embodiments, act 116 comprises normaliza sequencing ( WES ) at act 112. In some embodiments , nucleic tion ( e.g. , TPM normalization ) after removal of expression acids can be extracted that are suitable for tumor whole data for non - coding and /or bias - associated genes from the exome sequencing ( WES ) at act 112. In some embodiments, gene expression data . This normalization may be termed nucleic acids can be extracted that are suitable for tumor " renormalization ” herein . RNA sequencing at act 112. In some embodiments, nucleic [ 0098 ] At act 117 , one or more bioinformatic quality acids can be extracted that are suitable for CYTOF ( mass control assessments is performed on nucleic acid sequence cytometry ) at act 112. In some embodiments , nucleic acids data, for example DNA sequence data and / or RNA sequence can be extracted that are suitable for any type of sequencing data ( e.g. , bias corrected , and / or normalized , gene expres known in the art at act 112 . sion data ) . In some embodiments, one or more bioinformatic [ 0094 ] At act 113 , one or more quality control assessments quality control assessments can be performed to evaluate the can be performed . Acceptable and / or target thresholds can source and / or integrity of the nucleic acid sequence data . In be determined and used as references. In some embodi some embodiments , one or more bioinformatic quality con ments, the total amount of extracted DNA or RNA can be trol assessments described in this application are performed . used for quality control assessment. In some embodiments, [ 0099 ] In some embodiments , a method comprises all a spectrophotometer, for example a small volume full processes illustrated in FIG . 1. However, in some embodi spectrum , UV - visible spectrophotometer ( e.g. , NanoDrop ments , a subset of the processes is performed and any one or spectrophotometer available from Thermo Fisher Scientific , more of the processes may be omitted , duplicated , and / or www.thermofisher.com ) can be used for quality control performed in a different order than illustrated in FIG . 1. In assessment of DNA or RNA . In some embodiments , a some embodiments, a method comprises a process , option fluorometer, for example for quantification of DNA or RNA ally including one or more quality control steps , for prepar ( e.g. , a Qubit fluorometer available from Thermo Fisher ing a nucleic acid from a biological sample , wherein the Scientific, www.thermofisher.com ) can be used for quality nucleic acid is sequenced on at least one sequencing plat control assessment of DNA or RNA . In some embodiments, form . In some embodiments , a method comprises processing an automated electrophoresis system ( e.g. , TAPESTATION ) nucleic acid information obtained ( e.g. , received ) from a can be used for quality control assessment of DNA or RNA . sequencing platform to generate DNA or RNA sequence In some embodiments , a real- time PCR system ( e.g. , data for subsequent analysis ( e.g. , to generate bias - corrected , LIGHTCYCLER® ) can be used for quality control assess optionally normalized gene expression data for subsequent ment of DNA or RNA . analysis ). In some embodiments, one or more processes of [ 0095 ] In some embodiments, act 114 comprises preparing FIG . 1 are implemented on a computer. In some embodi libraries for the extracted nucleic acids that have satisfied at ments , a method comprises identifying a treatment ( e.g. , a least one quality control threshold at act 113. In some cancer treatment) for a subject ( e.g. , a subject having , embodiments , act 114 comprises one or more methods suspected of having, or at risk of having cancer ). In some described in Example 2 . embodiments, a method comprises administering the treat [ 0096 ] In some embodiments, act 115 comprises sequenc ment to the subject. ing nucleic acid ( e.g. , the DNA , RNA , or related libraries of act 114 ) to obtain DNA sequence data and / or RNA sequence Biological Samples data ( e.g. , RNA expression data ) using at least one nucleic acid sequencing platform ( e.g. , a next generation nucleic [ 0100 ] Any of the methods, systems, or other claimed acid sequencing platform ). Sequence data obtained at act elements may use or be used to analyze a biological sample 115 can be stored in any suitable format ( e.g. , in the form of from a subject. In some embodiments, a biological sample is one or more FASTQ files ). obtained from a subject having or suspected of having [ 0097 ] In some embodiments , RNA expression data is cancer . One or more biological samples from a subject may converted to gene expression data at act 116. In some be analyzed as described herein to obtain information about embodiments, RNA expression data is aligned to known the subject's cancer . The biological sample may be any type genes in a database , for example to a known assembled of biological sample including, for example, a biological genome ( e.g. , a ) or to a transcriptome in the sample of a bodily fluid ( e.g. , blood , urine or cerebrospinal database . In some embodiments, a program for quantifying fluid ), one or more cells ( e.g. , from a scraping or brushing transcripts, for example from bulk and single - cell RNA - Seq such as a cheek swab or tracheal brushing ), a piece of tissue data , using high -throughput sequencing reads ( e.g. , Kallisto ( cheek tissue , muscle tissue , lung tissue , heart tissue , brain ( hg38 ) available from Github, www.github.com , for tissue , or skin tissue ) , or some or all of an organ ( e.g. , brain , US 2021/0005284 A1 Jan. 7 , 2021 9 lung , liver, bladder, kidney, pancreas , intestines, or muscle ) , ( such as blood ( e.g. , whole blood , blood serum , or blood or other types of biological samples ( e.g. , feces or hair ) . plasma ), saliva , tears, synovial fluid , cerebrospinal fluid , [ 0101 ] In some embodiments, the biological sample is a pleural fluid , pericardial fluid , ascitic fluid , and /or urine ], sample of a tumor from a subject. In some embodiments , the hair, skin ( including portions of the epidermis, dermis , biological sample is a sample of blood from a subject. In and / or hypodermis ), oropharynx, laryngopharynx , esopha some embodiments , the biological sample is a sample of gus , stomach , bronchus, salivary gland , tongue , oral cavity, tissue from a subject. nasal cavity, vaginal cavity , anal cavity, bone, bone marrow , [ 0102 ] A sample of a tumor , in some embodiments , refers brain , thymus, spleen , small intestine, appendix , colon , to a sample comprising cells from a tumor. In some embodi rectum , anus , liver, biliary tract, pancreas, kidney, ureter, ments, the sample of the tumor comprises cells from a bladder, urethra, uterus, vagina , vulva , ovary , cervix , scro benign tumor, e.g. , non - cancerous cells . In some embodi tum , penis , prostate , testicle , seminal vesicles , and / or any ments, the sample of the tumor comprises cells from a type of tissue ( e.g. , muscle tissue , epithelial tissue , connec premalignant tumor, e.g. , precancerous cells . In some tive tissue , or nervous tissue ) . embodiments, the sample of the tumor comprises cells from [ 0109 ] Any of the biological samples described herein a malignant tumor, e.g. , cancerous cells . may be obtained from the subject using any known tech [ 0103 ] Examples of tumors include, but are not limited to , nique. See , for example , the following publications on adenomas, fibromas, hemangiomas, lipomas , cervical dys collecting , processing , and storing biological samples , each plasia , metaplasia of the lung , leukoplakia , carcinoma , sar of which are incorporated herein in its entirety: Biospeci coma , germ cell tumors , and blastoma . mens and biorepositories: from afterthought to science by [ 0104 ] A sample of blood , in some embodiments, refers to Vaught et al . ( Cancer Epidemiol Biomarkers Prey . 2012 a sample comprising cells , e.g. , cells from a blood sample. February ; 21 ( 2 ) : 253-5 ) , and Biological sample collection , In some embodiments, the sample of blood comprises non processing, storage and information management by Vaught cancerous cells . In some embodiments, the sample of blood and Henderson ( IARC Sci Publ. 2011 ; ( 163 ) : 23-42 ) . comprises precancerous cells . In some embodiments, the [ 0110 ] In some embodiments, the biological sample may sample of blood comprises cancerous cells . In some embodi be obtained from a surgical procedure ( e.g. , laparoscopic ments , the sample of blood comprises blood cells . In some surgery , microscopically controlled surgery , or endoscopy ), embodiments, the sample of blood comprises red blood bone marrow biopsy, punch biopsy , endoscopic biopsy , or cells . In some embodiments , the sample of blood comprises needle biopsy ( e.g. , a fine - needle aspiration , core needle white blood cells . In some embodiments, the sample of biopsy, vacuum - assisted biopsy, or image - guided biopsy ) . In blood comprises platelets . Examples of cancerous blood some embodiments, the biological sample may be obtained cells include , but are not limited to , leukemia , lymphoma, from an autopsy . and myeloma. In some embodiments , a sample of blood is [ 0111 ] In some embodiments, one or more than one cell collected to obtain the cell - free nucleic acid ( e.g. , cell - free ( i.e. , a cell biological sample ) may be obtained from a DNA ) in the blood . subject using a scrape or brush method . The cell biological [ 0105 ] A sample of blood may be a sample of whole blood sample may be obtained from any area in or from the body or a sample of fractionated blood . In some embodiments, the of a subject including , for example, from one or more of the sample of blood comprises whole blood . In some embodi following areas : the cervix, esophagus, stomach , bronchus, ments , the sample of blood comprises fractionated blood . In or oral cavity. In some embodiments , one or more than one some embodiments, the sample of blood comprises buffy piece of tissue ( e.g. , a tissue biopsy ) from a subject may be coat . In some embodiments, the sample of blood comprises used . In certain embodiments , the tissue biopsy may com serum . In some embodiments, the sample of blood com prise one or more than one ( e.g. , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , or prises plasma . In some embodiments, the sample of blood more than 10 ) biological samples from one or more tumors comprises a blood clot . or tissues known or suspected of having cancerous cells . [ 0106 ] A sample of a tissue , in some embodiments, refers [ 0112 ] Any of the biological samples from a subject to a sample comprising cells from a tissue . In some embodi described herein may be stored using any method that ments , the sample of the tumor comprises non - cancerous preserves stability of the biological sample . In some cells from a tissue . In some embodiments, the sample of the embodiments , preserving the stability of the biological tumor comprises precancerous cells from a tissue . In some sample means inhibiting components ( e.g. , DNA , RNA , embodiments, the sample of the tumor comprises precan protein , or tissue structure or morphology ) of the biological cerous cells from a tissue . sample from degrading until they are measured so that when [ 0107 ] Methods of the present disclosure encompass a measured , the measurements represents the state of the variety of tissue including organ tissue or non - organ tissue , sample at the time of obtaining it from the subject. In some including but not limited to , muscle tissue , brain tissue , lung embodiments, a biological sample is stored in a composition tissue , liver tissue , epithelial tissue , connective tissue , and that is able to penetrate the same and protect components nervous tissue . In some embodiments , the tissue may be ( e.g. , DNA , RNA , protein , or tissue structure or morphol normal tissue or it may be diseased tissue or it may be tissue ogy ) of the biological sample from degrading. As used suspected of being diseased . In some embodiments, the herein , degradation is the transformation of a component tissue may be sectioned tissue or whole intact tissue . In some from one from to another such that the first form is no longer embodiments, the tissue may be animal tissue or human detected at the same level as before degradation . tissue . Animal tissue includes , but is not limited to , tissues [ 0113 ] In some embodiments, the biological sample is obtained from rodents ( e.g. , rats or mice ) , primates ( e.g. , stored using cryopreservation . Non - limiting examples of monkeys ), dogs , cats, and farm animals. cryopreservation include , but are not limited to , step - down [ 0108 ] The biological sample may be from any source in freezing, blast freezing, direct plunge freezing, snap freez the subject’s body including, but not limited to , any fluid ing , slow freezing using a programmable freezer, and vitri US 2021/0005284 Al Jan. 7 , 2021 10 fication . In some embodiments , the biological sample is ( e.g., during a different procedure including a procedure 1 , stored using lyophilisation . In some embodiments , a bio 2,3,4,5,6,7,8,9 , 10 days; 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 weeks; logical sample is placed into a container that already con 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 months, 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , tains a preservant ( e.g. , RNALater to preserve RNA ) and 10 years , or 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 decades after a first then frozen ( e.g. , by snap - freezing ), after the collection of procedure ). the biological sample from the subject. In some embodi [ 0118 ] A second or subsequent biological sample may be ments , such storage in frozen state is done immediately after taken or obtained from the same region ( e.g. , from the same collection of the biological sample . In some embodiments , a tumor or area of tissue ) or a different region ( including , e.g. , biological sample may be kept at either room temperature or a different tumor ). A second or subsequent biological sample 4 ° C. for some time ( e.g. , up to an hour, up to 8 h , or up to may be taken or obtained from the subject after one or more 1 day, or a few days ) in a preservant or in a buffer without treatments and may be taken from the same region or a a preservant, before being frozen . different region . As a non - limiting example , the second or [ 0114 ] Non -limiting examples of preservants include for subsequent biological sample may be useful in determining malin solutions , formaldehyde solutions , RNALater or other whether the cancer in each biological sample has different equivalent solutions , TriZol or other equivalent solutions , characteristics ( e.g. , in the case of biological samples taken DNA / RNA Shield or equivalent solutions , EDTA ( e.g. , from two physically separate tumors in a patient) or whether Buffer AE ( 10 mM Tris.Cl; 0.5 mM EDTA , pH 9.0 ) ) and the cancer has responded to one or more treatments ( e.g. , in other coagulants, and Acids Citrate Dextronse ( e.g. , for the case of two or more biological samples from the same blood specimens ) . tumor or different tumors prior to and subsequent to a [ 0115 ] In some embodiments, special containers may be treatment) . In some embodiments, each of the at least one used for collecting and / or storing a biological sample. For biological sample is a bodily fluid sample, a cell sample, or example, a vacutainer may be used to store blood . In some a tissue biopsy sample . embodiments , a vacutainer may comprise a preservant ( e.g. , [ 0119 ] In some embodiments, one or more biological a coagulant, or an anticoagulant ). In some embodiments, a specimens are combined ( e.g. , placed in the same container container in which a biological sample is preserved may be for preservation ) before further processing . For example, a contained in a secondary container, for the purpose of better first sample of a first tumor obtained from a subject may be preservation , or for the purpose of avoid contamination . combined with a second sample of a second tumor from the [ 0116 ] Any of the biological samples from a subject subject, wherein the first and second tumors may or may not described herein may be stored under any condition that be the same tumor . In some embodiments , a first tumor and preserves stability of the biological sample. In some a second tumor are similar but not the same ( e.g. , two tumors embodiments, the biological sample is stored at a tempera in the brain of a subject ). In some embodiments , first ture that preserves stability of the biological sample. In some biological sample and a second biological sample from a embodiments , the sample is stored at room temperature subject are sample of different types of tumors ( e.g. , a tumor ( e.g. , 25 ° C. ) . In some embodiments, the sample is stored in muscle tissue and brain tissue ) . under refrigeration ( e.g. , 4 ° C. ) . In some embodiments , the [ 0120 ] In some embodiments, a sample from which RNA sample is stored under freezing conditions ( e.g. , -20 ° C. ) . In and / or DNA is extracted ( e.g. , a sample of tumor, or a blood some embodiments, the sample is stored under ultralow sample ) is sufficiently large such that at least 2 ug ( e.g. , at temperature conditions ( e.g. , -50 ° C. to -800 ° C. ) . In some least 2 ug , at least 2.5 ug , at least 3 ug , at least 3.5 ug or embodiments, the sample is stored under liquid nitrogen more ) of RNA can be extracted from it . In some embodi ( e.g. , -1700 ° C. ) . In some embodiments , a biological sample ments , the sample from which RNA and / or DNA is extracted is stored at -60 ° C. to -8 ° C. ( e.g. , -70 ° C. ) for up to 5 years can be peripheral blood mononuclear cells ( PBMCs ) . In ( e.g. , up to 1 month , up to 2 months, up to 3 months, up to some embodiments, the sample from which RNA and / or 4 months, up to 5 months, up to 6 months, up to 7 months, DNA is extracted can be any type of cell suspension . In some up to 8 months, up to 9 months, up to 10 months, up to 11 embodiments , a sample from which RNA and /or DNA is months, up to 1 year, up to 2 years , up to 3 years , up to 4 extracted ( e.g. , a sample of tumor, or a blood sample ) is years , or up to 5 years ). In some embodiments , a biological sufficiently large such that at least 1.8 ug RNA can be sample is stored as described by any of the methods extracted from it . In some embodiments, at least 50 mg ( e.g. , described herein for up to 20 years ( e.g. , up to 5 years , up at least 1 mg , at least 2 mg , at least 3 mg , at least 4 mg , at to 10 years, up to 15 years , or up to 20 years ). least 5 mg , at least 10 mg , at least 12 mg , at least 15 mg , at [ 0117 ] Methods of the present disclosure encompass least 18 mg , at least 20 mg , at least 22 mg , at least 25 mg , obtaining one or more biological samples from a subject for at least 30 mg , at least 35 mg , at least 40 mg , at least 45 mg , analysis. In some embodiments, one biological sample is or at least 50 mg ) of tissue sample is collected from which collected from a subject for analysis. In some embodiments, RNA and / or DNA is extracted . In some embodiments, at more than one ( e.g. , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , least 20 mg of tissue sample is collected from which RNA 15 , 16 , 17 , 18 , 19 , 20 , or more ) biological samples are and / or DNA is extracted . In some embodiments, at least 30 collected from a subject for analysis. In some embodiments , mg of tissue sample is collected . In some embodiments, at one biological sample from a subject will be analyzed . In least 10-50 mg ( e.g. , 10-50 mg, 10-15 mg, 10-30 mg, 10-40 some embodiments, more than one ( e.g. , 2 , 3 , 4 , 5 , 6 , 7 , 8 , mg , 20-30 mg , 20-40 mg , 20-50 mg , or 30-50 mg ) of tissue 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , or more ) sample is collected from which RNA and / or DNA is biological samples may be analyzed . If more than one extracted . In some embodiments, at least 30 mg of tissue biological sample from a subject is analyzed , the biological sample is collected . In some embodiments , at least 20-30 mg samples may be procured at the same time ( e.g. , more than of tissue sample is collected from which RNA and / or DNA one biological sample may be taken in the same procedure ), is extracted . In some embodiments , a sample from which or the biological samples may be taken at different times RNA and / or DNA is extracted ( e.g. , a sample of tumor, or a US 2021/0005284 A1 Jan. 7 , 2021 11 blood sample) is sufficiently large such that at least 0.2 ug measurements and assessment of single cells in a tumor ( e.g. , at least 200 ng , at least 300 ng , at least 400 ng , at least sample provides information that is not confounded by 500 ng , at least 600 ng , at least 700 ng , at least 800 ng , at genotypic or phenotypic heterogeneity of bulk samples. In least 900 ng , at least 1 ug , at least 1.1 ug , at least 1.2 ug , at some embodiments , a single cell - suspension is prepared least 1.3 ug , at least 1.4 ug , at least 1.5 ug , at least 1.6 ug , from one or more biological samples obtained from a subject at least 1.7 ug , at least 1.8 ug , at least 1.9 ug , or at least 2 for use in methods such as single - cell RNA or DNA sequenc ug ) of RNA can be extracted from it . In some embodiments, ing , or mass cytometry . a sample from which RNA and /or DNA is extracted ( e.g. , a [ 0123 ] Accordingly, some embodiments of any one of the sample of tumor, or a blood sample ) is sufficiently large such methods described herein comprise forming a single - cell that at least 0.1 ug ( e.g. , at least 100 ng , at least 200 ng , at suspension of cells from a sample of tumor ( e.g. , a first least 300 ng , at least 400 ng , at least 500 ng , at least 600 ng , sample of tumor ). In some embodiments, forming a single at least 700 ng , at least 800 ng , at least 900 ng , at least 1 ug , cell suspension of cells comprises from a sample of tumor at least 1.1 ug , at least 1.2 ug , at least 1.3 ug , at least 1.4 ug , comprises dissecting a tumor sample to obtain tumor sample at least 1.5 ug , at least 1.6 ug , at least 1.7 ug , at least 1.8 ug , fragments. A curved scissor may be used to dissect a tumor at least 1.9 ug , or at least 2 ug ) of RNA can be extracted from tissue sample. In some embodiments, a tumor sample frag it . ment is 0.5-3 mm ( e.g. , 1-2 mmº) . In some embodiments, a tumor tissue sample or fragment thereof is kept moist Subjects while dissecting. [ 0121 ] Aspects of this disclosure relate to a biological [ 0124 ] A method of preparing a single - cell suspension sample that has been obtained from a subject. In some from a tumor sample may comprise any one or more of the embodiments, a subject is a mammal ( e.g. , a human , a following steps in any order: fine mincing , enzymatic and /or mouse , a cat, a dog , a horse, a hamster, a cow , a pig , or other non - enzymatic digestion , vigorous pipetting, passage domesticated animal ). In some embodiments, a subject is a through a cell - strainer , washing, and counting. In some human . In some embodiments , a subject is an adult human embodiments , one or more of these steps are repeated ( e.g. , ( e.g. , of 18 years of age or older ). In some embodiments, a 1 time , 2 times , 3 times , 4 times , or 5 or more times ) . subject is a child ( e.g. , less than 18 years of age ) . In some [ 0125 ] In some embodiments, a tumor sample or tumor embodiments, a human subject is one who has or has been sample fragment/ s is incubated in an cocktail. Any diagnosed with at least one form of cancer. In some embodi number and combination of can be used , see e.g. , ments , a cancer from which a subject suffers is a carcinoma, BioFiles : For Life Science Research , Issue 2 , 2006 www . a sarcoma , a myeloma, a leukemia , a lymphoma, or a mixed sigmaaldrich.com/content/dam/sigmaaldrich/docs/Sigma/ type of cancer that comprises more than one of a carcinoma, General_Information / 2 /biofiles_issue2.pdf , which is incor a sarcoma , a myeloma , a leukemia , and a lymphoma. porated herein by reference in its entirety , and especially to Carcinoma refers to a malignant neoplasm of epithelial incorporate herein any of enzyme or other component ( e.g. , origin or cancer of the internal or external lining of the body. media ) listed therein . Sarcoma refers to cancer that originates in supportive and [ 0126 ] Quatromoni et al . ( An optimized disaggregation connective tissues such as bones, tendons , cartilage, muscle , method for human lung tumors that preserves the phenotype and fat . Myeloma is cancer that originates in the plasma cells and function of the immune cells ; J Leukoc Biol . 2015 of bone marrow . Leukemias ( “ liquid cancers ” or “ blood January ; 97 ( 1 ) : 201-209 ) provides a comparison of different cancers ” ) are cancers of the bone marrow ( the site of blood enzymatic cocktails and is incorporated herein by reference cell production ). Lymphomas develop in the glands or nodes in its entirety . In some embodiments, an enzyme cocktail of the lymphatic system , a network of vessels , nodes, and comprises any one or more of the following components: organs ( specifically the spleen , tonsils , and thymus ) that media ( e.g. , L - 15 media ), anti- bacterials ( e.g. , penicillin purify bodily fluids and produce infection - fighting white and / or streptomycin ), anti - fungals ( e.g. , amphoterecin ), col blood cells , or lymphocytes. Non - limiting examples of a lagenase ( e.g., collagenase I , collagenase II , collagenase IV ) , mixed type of cancer include adenosquamous carcinoma, DNAse ( e.g. , DNAse I ) , elastase , , and pro mixed mesodermal tumor, carcinosarcoma, and teratocarci teases ( e.g. , protease XIV , trypsin , papain , or termolysin ). noma . In some embodiments , a subject has a tumor . A tumor Coll I has the original balance of collagenase , caseinase , may be benign or malignant. In some embodiments, a cancer clostripain , and tryptic activities ; Coll II contains higher is any one of the following: skin cancer, lung cancer, breast relative levels of protease activity, particularly clostripain ; cancer, prostate cancer, colon cancer, rectal cancer, cervical and Coll IV is designed to be especially low in tryptic cancer, and cancer of the uterus . In some embodiments, a activity ( Quatromoni et al . , J Leukoc Biol . 2015 January ; subject is at risk for developing cancer, e.g. , because the 97 ( 1 ) : 201-209 ) . In some embodiments, only collagenase I , subject has one or more genetic risk factors, or has been collagenase II , or collagenase IV is used . In some embodi exposed to or is being exposed to one or more carcinogens ments, a mixture of two collegenases is used ( e.g. , collage ( e.g. , cigarette smoke , or chewing tobacco ) . nase I and collagenase II , collagenase I and collagenase IV , or collagenase II and collagenase IV ) . In some embodi Single Cell Suspensions ments , more than 2 collegenases are used ( e.g. , collagenase [ 0122 ] In some embodiments, methods ( e.g. , RNA I , collagenase II , and collagenase IV ) . sequencing, DNA sequencing , or multiplexed flow cytom [ 0127 ] In some embodiments, an enzyme cocktail com etry ) to characterize a cancer that a subject has or is prises one or more of the following components: media ( e.g. , suspected of having is performed at a single - cell level to complete media ) , penicillin , streptomycin , collagenase ( e.g. , capture the heterogeneity of a single tumor or cancerous collagenase I or collagenase IV ) . Concentrations of enzymes tissue , or multiple tumors or cancerous tissues . That is , in a cocktail can be adjusted . A non - limiting example of an US 2021/0005284 A1 Jan. 7 , 2021 12 enzyme cocktail is as follows : collagenase I ( 0.2 mg/ ml ) , isolated from a biological sample are cultured and expanded collagenase IV ( 1 mg /ml ) , complete medium , penicillin and then stored . In some embodiments, cells isolated from a ( 0.001 % ) , and DNAse . biological sample are cultured and expanded after storage . [ 0128 ] In some embodiments, at least 25 ml ( e.g. , at least [ 0134 ] In some embodiments, any one of the methods 25 ml , at least 26 ml , at least 27 ml , at least 28 ml , at least described herein further comprises forming a lysate from at 29 ml , or at least 30 ml ) of enzyme cocktail is added per 0.5 least a part ( e.g. , a first or second part) of the single - cell gm of tumor tissue . In some embodiments, a sample of suspension . In some embodiments, different parts of a tumor or fragments thereof is incubated in enzyme cocktail single - cell suspension comprise different types of cells . In while the sample is being shaken or agitated ( e.g. , tumbled some embodiments , a part of the single - cell suspension from at 85 RPM , and / or vigorous pipetting ). In some embodi which a lysate is formed comprises at least 1x100 cells ( e.g. , ments , sample of tumor or fragments thereof is incubated in at least 1x10 cells , at least 2x10 cells , at least 3x10 cells , enzyme cocktail at a temperature between 20-50 ° C. ( e.g. , at least 4x10 cells , or at least 5x10 cells ) . In some embodi 20-50 ° C. , 20-25 ° C. , 25-30 ° C. , 25-35 ° C. , 30-40 ° C. , ments, a part of the single - cell suspension from which a 35-45 ° C. , 40-50 ° C. , or 30-50 ° C. ) . In some embodiments, lysate is formed comprises at least 2x10 cells . Lysate may a method of preparing a single -cell suspension comprises be stored in storage mediums that will prevent the degra filtering the enzyme cocktail , e.g. , through a cell strainer dation of DNA and / or RNA ( e.g. , RNALater ). In some ( e.g. , 50 um , 70 um , or 100 um ). In some embodiments , too embodiments, a method comprises extracting RNA from the fine of a filter may result in a cell composition having a high lysate from a single - cell suspension or each part of a concentration of fibroblast cells . In some embodiments , too single - cell suspension and performing RNA sequencing on coarse a filter may result in clumps of cells . In some the extracted RNA to obtain RNA expression data . These embodiments, clumps of cells are disaggregated using RNA expression data can be used to determine the hetero mechanical force ( e.g. , vigorous pipetting, or applying pres geneity of a tumor. sure using a syringe ). [ 0135 ] An overview of single - cell RNA sequencing can be [ 0129 ] In some embodiments, filtered cells are lysed using found at hemberg-lab.githib.io/scRNA.seq.course/introduc RBC lysis buffer to lyse red blood cells . RBC lysis buffers tion - to - single -cell - rna -seq.html , FIG . 2.1 of which is incor are available commercially ( see e.g. , www.abcam.com/red porated by reference herein . In some embodiments, a blood - cell -rbc - lysis - buffer - ab204733.html ). method of performing RNA sequencing a single - cell sus [ 0130 ] In some embodiments , a method of preparing a pension comprises single - cell RNA isolation , reverse tran single - cell suspension comprises enzymatic and mechanical scription cDNA pre - amplification, cDNA library preparation dissociation . Examples of methods of dissociating cells from ( e.g. , using Fluidigm Cl Protocol) , and sequencing of tissue can be found in the following publications: Quatro sequenced using platforms such as Illumina HiSeq 2500 . moni et al . , An optimized disaggregation method for human [ 0136 ] Methods of performing single - cell RNA sequenc lung tumors that preserves the phenotype and function of the ing are described by the following references , each of which immune cells ; J Leukoc Biol. 2015 January ; 97 ( 1 ) : 201-209 , is incorporated herein by reference in its entirety : Bagnoli et Pennartz et al . , Generation of Single - Cell Suspensions from al . ( Studying Cancer Heterogeneity by Single - Cell RNA Mouse Neural Tissue; JOVE Issue 29 ; doi : 10.3791 / 1267 ; Sequencing; Methods Mol Biol . 2019 ; 1956 : 305-319 ) ; Sun Published : Jul. 7 , 2009 , and www.youtube.com/ et al . ( Single -cell RNA sequencing reveals gene expression watch ? y = N0jfty YqM38 . signatures of breast cancer - associated endothelial cells ; [ 0131 ] In some embodiments, cell - dissociation buffers Oncotarget . 2018 Feb. 16 ; 9 ( 13 ) : 10945-10961 ) ; Kulkarni et that do not contain enzymes are used . See e.g. , Ther al . ( Beyond bulk : a review of single cell transcriptomics moFisher Scientific catalog numbers 13151014 and methodologies and applications; Curr Opin Biotechnol. 13150016 , or Millipore Sigma Aldrich catalog number 2019 Apr. 9 ; 58 : 129-136 ) ; Huang et al ( High Throughput S - 014 - B . Heng et al . ( Biol Proced Online . 2009 ; 11 : 161 Single Cell RNA Sequencing, Bioinformatics Analysis and 169 ) provides a comparison of enzymatic and non -enzy Applications, Adv Exp Med Biol . 2018 ; 1068 : 33-43 ) ; Zili matic means of dissociating cells and is incorporated herein onis et al . ( Single - Cell Transcriptomics of Human and by reference in its entirety. Mouse Lung Cancers Reveals Conserved Myeloid Popula [ 0132 ] In some embodiments , the number of cells in a tions across Individuals and Species ; Immunity. 2019 Apr. 5 . single - cell suspension is counted and their viability tested . pii : 51074-7613 ( 19 ) 30126-8 ) ; and Kashima et al . ( An Infor The Examples below provide an example of an overall mative Approach to Single - Cell Sequencing Analysis ; Adv process of forming a single - cell suspension from a sample of Exp Med Biol . 2019 ; 1129 : 81-96 . doi : 10.1007 /978-981-13 tumor tissue . 6037-4_6 ) ; Seki et al . ( Single -Cell DNA - Seq and RNA -Seq [ 0133 ] In some embodiments, a method comprises form in Cancer Using the C1 System ; Adv Exp Med Biol . 2019 ; ing a single - cell suspension of cells from a sample of tumor 1129 : 27-50 . doi : 10.1007 / 978-981-13-6037-4_3 ) ; and See et and partitioning in into at least a first and second part. A first al . ( A Single - Cell Sequencing Guide for Immunologists; and second part of a single -cells suspension may be of equal Front Immunol. 2018 ; 9 : 2425 ) . size or of different sizes ( e.g. , comprising a different number [ 0137 ] Gan et al . ( Identification of cancer subtypes from of cells ) . In some embodiments all the parts of a single - cell single - cell RNA - seq data using a consensus clustering suspension ( e.g. , a first part, a second part, and so on ) are method ; BMC Med Genomics . 2018 ; 11 ( Suppl 6 ) : 117 ) stored in separate containers and stored under the same or describes a clustering method for single - cell RNA sequenc similar conditions ( e.g. , in liquid nitrogen , or —80 ° C. ) . In ing data , which is incorporated herein in its entirety by some embodiments, the different parts of a single - cell sus reference . pension are stored under different conditions, either before [ 0138 ] In some embodiments, any one of the following or after any further processing ( e.g. , labeling with antibodies single - cell RNA sequencing methods is used : Fluidigm Ci for protein expression studies ). In some embodiments , cells system ( SMART- seq ), Fluidigm C1 system (mRNA Seq US 2021/0005284 A1 Jan. 7 , 2021 13

HT ) , SMART - seq2, 10x Genomics Chromium system , and ing” may include assessing the presence, absence, quantity MARS - seq . See et al . ( Front Immunol. 2018 ; 9 : 2425 ) and / or amount ( which can be an effective amount) of a provides a comparison of these methods and is incorporated substance within a sample, including the derivation of herein in its entirety by reference . qualitative or quantitative concentration levels of such sub [ 0139 ] In some embodiments , any one of the methods stances, or otherwise evaluating the values and / or categori described herein further comprises performing measurement zation of such substances in a sample from a subject. of a single - cell suspension . In some embodiments , different [ 0144 ] The level of a protein may be measured using an measurements are made in parallel on the same cells . immunoassay. Examples of immunoassays include any Macaulay et al . ( Trends Genet. 2017 February ; 33 ( 2 ) : 155 known assay (without limitation ), and may include any of 168 ) describes methods of making multiple measurements the following: immunoblotting assay ( e.g. , Western blot ) , from single cells and is incorporated herein by reference in immunohistochemical analysis , flow cytometry assay , its entirety. immunofluorescence assay ( IF ) , enzyme linked immunosor [ 0140 ] In some embodiments , any one of the methods bent assays ( ELISAs ) ( e.g. , sandwich ELISAs ) , radioimmu described herein further comprises performing mass cytom noas says , electrochemiluminescence -based detection etry on at least a first part of the single -cell suspension . Mass assays , magnetic immunoassays, lateral flow assays , and cytometry is a mass spectrometry technique based on induc related techniques. Additional suitable immunoassays for tively coupled plasma mass spectrometry and time of flight detecting a level of a protein provided herein will be mass spectrometry used for the determination of the prop apparent to those of skill in the art . erties of cells . In some embodiments, mass cytometry com [ 0145 ] Such immunoassays may involve the use of an prises conjugating antibodies with isotopically pure ele agent ( e.g. , an antibody ) specific to the target protein . An ments , and then using them to label cellular molecules ( e.g. , agent such as an antibody that “ specifically binds ” to a target ). In some embodiments, cells are nebulized and sent protein is a term well understood in the art, and methods to through an argon plasma , which ionizes the metal- conju determine such specific binding are also well known in the gated antibodies . The metal signals are then analyzed by a art . An antibody is said to exhibit “ specific binding” if it time - of - flight mass spectrometer to identify and quantify the reacts or associates more frequently, more rapidly, with cellular molecules in the cells . In some embodiments, a greater duration and /or with greater affinity with a particular single - cell suspension or part thereof on which mass cytom target protein than it does with alternative proteins . It is also etry is performed comprises at least 1x10 cells ( e.g. , at least understood by reading this definition that, for example, an 1x10 cells , at least 2x10 cells , at least 3x10 cells , at least antibody that specifically binds to a first target peptide may 4x10 cells , at least 5x10 cells , at least 6x10 cells , at least or may not specifically or preferentially bind to a second 7x10 cells , at least 8x10 cells , at least 9x10 cells , or at target peptide. As such , " specific binding" or " preferential least 10x106 cells ) . In some embodiments, a single - cell binding ” does not necessarily require ( although it can suspension or part thereof on which mass cytometry is include ) exclusive binding . Generally, but not necessarily, performed comprises at least 5x10 cells . reference to binding means preferential binding. In some [ 0141 ] Methods of performing mass cytometry are examples , an antibody that “ specifically binds ” to a target described by the following references, each of which is peptide or an epitope thereof may not bind to other peptides incorporated herein by reference in its entirety : Galli et al . or other epitopes in the same antigen . In some embodiments , ( The end of omics ? High dimensional single cell analysis in a sample may be contacted , simultaneously or sequentially , precision medicine ; Eur J Immunol. 2019 February ; 49 ( 2 ) : with more than one binding agent that binds different 212-220 ) ; Brodin ( The biology of the cell — insights from proteins ( e.g. , multiplexed analysis ). mass cytometry ; FEBS J. 2018 Nov. 3. doi : 10.1111 / febs. [ 0146 ] As used herein , the term “ antibody ” refers to a 14693 ) ; Olsen et al . ( The anatomy of single cell mass protein that includes at least one immunoglobulin variable cytometry data ; Cytometry A. 2019 February ; 95 ( 2 ) :156 domain or immunoglobulin variable domain sequence . For 172 ) ; Behbehani ( Applications of Mass Cytometry in Clini example, an antibody can include a heavy ( H ) chain variable cal Medicine : The Promise and Perils of Clinical CyTOF ; region ( abbreviated herein as VH ) , and a light ( L ) chain Clin Lab Med . 2017 December; 37 ( 4 ) : 945-964 ) ; Gond variable region ( abbreviated herein as VL ) . In another halekar et al . ( Alternatives to current flow cytometry data example , an antibody includes two heavy ( H ) chain variable analysis for clinical and research studies ; Methods. 2018 regions and two light ( L ) chain variable regions . The term Feb. 1 ; 134-135 : 113-129 ) ; and Soares et al . ( Go with the " antibody " encompasses antigen -binding fragments of anti flow : advances and trends in magnetic flow cytometry ; Anal bodies ( e.g. , single chain antibodies , Fab and sFab frag Bioanal Chem . 2019 March ; 411 ( 9 ) : 1839-1862 . doi : ments , F ( ab ' ) 2 , Fd fragments, Fv fragments , scFv , and 10.1007 / s00216-019-01593-9 . Epub 2019 Feb. 19 ) . domain antibodies ( dAb ) fragments ( de Wildt et al . , Eur J Immunol. 1996 ; 26 ( 3 ) : 629-39 . ) ) as well as complete anti Other Assays bodies . An antibody can have the structural features of IgA , [ 0142 ] Any of the biological samples described herein can IgG , IgE , IgD , IgM ( as well as subtypes thereof ). Antibodies be used for obtaining expression data using conventional may be from any source including , but not limited to , assays or those described herein . Expression data , in some primate ( human and non - human primate ) and primatized embodiments, includes gene expression levels. Gene expres ( such as humanized ) antibodies. sion levels may be detected by detecting a product of gene [ 0147 ] In some embodiments , the antibodies as described expression such as mRNA and / or protein . herein can be conjugated to a detectable label and the [ 0143 ] In some embodiments , gene expression levels are binding of the detection reagent to the peptide of interest can determined by detecting a level of a protein in a sample be determined based on the intensity of the signal released and / or by detecting a level of activity of a protein in a from the detectable label . Alternatively, a secondary anti sample . As used herein , the terms “ determining ” or “ detect body specific to the detection reagent can be used . One or US 2021/0005284 A1 Jan. 7 , 2021 14 more antibodies may be coupled to a detectable label . Any In some embodiments, the contacting is performed by cap suitable label known in the art can be used in the assay illary action in which a sample is moved across a surface of methods described herein . In some embodiments , a detect the support membrane . able label comprises a fluorophore. As used herein , the term [ 0153 ] In some embodiments, an assay may be performed “ fluorophore ” ( also referred to as “ fluorescent label” or in a low - throughput platform , including single assay format . “ fluorescent dye ” ) refers to moieties that absorb light energy In some embodiments , an assay may be performed in a at a defined excitation wavelength and emit light energy at high - throughput platform . Such high - throughput assays may a different wavelength . In some embodiments, a detection comprise using a binding agent immobilized to a solid moiety is or comprises an enzyme. In some embodiments, an support ( e.g. , one or more chips ) . Methods for immobilizing enzyme is one ( e.g. , B - galactosidase ) that produces a colored a binding agent will depend on factors such as the nature of product from a colorless substrate . the binding agent and the material of the solid support and [ 0148 ] It will be apparent to those of skill in the art that may require particular buffers. Such methods will be evident this disclosure is not limited to immunoassays. Detection to one of ordinary skill in the art . assays that are not based on an antibody, such as mass Extraction of DNA and / or RNA spectrometry, are also useful for the detection and / or quan [ 0154 ] In some embodiments of any one of the methods tification of a protein and / or a level of protein as provided described herein , RNA is extracted from a biological sample herein . Assays that rely on a chromogenic substrate can also to prevent it from being degraded and / or to prevent the be useful for the detection and / or quantification of a protein inhibition of enzymes in downstream processing , e.g. , the and / or a level of protein as provided herein . preparation of DNA ( i.e. , a cDNA library from RNA ). In [ 0149 ] Alternatively , the level of nucleic acids encoding a some embodiments of any one of the methods described gene in a sample can be measured via a conventional herein , DNA is extracted from a biological sample to prevent method . In some embodiments, measuring the expression it from being degraded and / or to prevent the inhibition of level of nucleic acid encoding the gene comprises measuring enzymes in downstream processing , e.g. , the preparation of mRNA . In some embodiments , the expression level of DNA . In some embodiments, the term " extraction ” in the mRNA encoding a gene can be measured using real - time context of obtaining DNA or RNA from a biological sample reverse transcriptase (RT ) Q - PCR or a nucleic acid microar is used interchangeably with the term “ isolation .” ray. Methods to detect nucleic acid sequences include, but [ 0155 ] Methods described herein involve extraction of are not limited to , polymerase chain reaction ( PCR ) , reverse RNA and / or DNA from a biological sample ( e.g. , a tumor transcriptase -PCR (RT - PCR ), in situ PCR , quantitative PCR sample or sample of blood ). As described above, a biological ( Q - PCR ) , real - time quantitative PCR (RT Q - PCR ) , in situ sample may be comprised of more than one sample from one hybridization , Southern blot , Northern blot , sequence analy or more than one tissues ( e.g. , one or more than one different sis , microarray analysis , detection of a reporter gene , or tumors ). In some embodiments, RNA and / or DNA are other DNA /RNA hybridization platforms. extracted from a combined sample. In some embodiments, RNA and or DNA is extracted from multiple biological [ 0150 ] In some embodiments, the level of nucleic acids samples from a subject, and then then combined before encoding a gene in a sample can be measured via a hybrid further processing ( e.g. , storage, or DNA library prepara ization assay . In some embodiments, the hybridization assay tion ) . In some embodiments , more than one sample of comprises at least one binding partner. In some embodi extracted RNA and / or DNA are combined with each other ments, the hybridization assay comprises at least one oligo after retrieval from storage . In some embodiments, at least nucleotide binding partner. In some embodiments , the tumor DNA is extracted from one or more tumor tissues . In hybridization assay comprises at least one labeled oligo some embodiments , at least tumor RNA is extracted from nucleotide binding partner. In some embodiments, the one or more tumor tissues . In some embodiments, at least hybridization assay comprises at least one pair of oligo normal DNA is extracted from one of more normal tissues nucleotide binding partners. In some embodiments, the to serve as a control. In some embodiments, at least normal hybridization assay comprises at least one pair of labeled RNA is extracted from one of more normal tissues to serve oligonucleotide binding partners . as a control. Protocols of DNA /RNA extraction can be found [ 0151 ] Any binding agent that specifically binds to a at least in Example 2 . desired nucleic acid or protein may be used in the methods [ 0156 ] Methods for extracting DNA and / or RNA from and kits described herein to measure an expression level in biological samples are known in the art, and reagents and a sample . In some embodiments , the binding agent is an kits for doing so are commercially available . Gómez - Acata antibody or an aptamer that specifically binds to a desired et al . (Methods for extracting ' omes from microbialites , J protein . In other embodiments, the binding agent may be one Microbiol Methods. 2019 Mar. 12 ; 160 : 1-10 ) describes or more oligonucleotides complementary to a nucleic acid or methods for extracting applied for DNA and RNA extraction a portion thereof. In some embodiments, a sample may be from microbialites and describes their advantages and dis contacted , simultaneously or sequentially , with more than advantages and is incorporated herein by reference in its one binding agent that binds different proteins or different entirety. The methods described in Gomez - Acata et al . are nucleic acids ( e.g. , multiplexed analysis ) . generally applicable for RNA and / or DNA extracted from [ 0152 ] To measure an expression level of a protein or tissue . Moore ( Curr Protoc Immunol. 2001 May ; Chapter nucleic acid , a sample can be in contact with a binding agent 10 :Unit 10.1 ) describes purification and concentration of under suitable conditions. In general, the term " contact ” DNA from aqueous solutions and is also incorporated by refers to an exposure of the binding agent with the sample reference herein in its entirety . or cells collected therefrom for suitable period sufficient for [ 0157 ] In some embodiments, extracting DNA and / or the formation of complexes between the binding agent and RNA comprises lysing cells of a biological sample and the target protein or target nucleic acid in the sample, if any . isolating DNA and /or RNA from other cellular components. US 2021/0005284 A1 Jan. 7 , 2021 15

Examples of methods for lysing cells include, but are not nucleic acids or by combining nucleic acids extracted from limited to , mechanical lysis , liquid homogenization , sonica one or more biological samples. In some embodiments, a tion , freeze -thaw , chemical lysis , alkaline lysis , and manual first biological sample is combined with a second biological grinding sample to form a combined sample and extracting DNA [ 0158 ] Methods for extracting DNA and / or RNA include, and / or RNA from the combined sample. In some embodi but are not limited to , solution phase extraction methods and ments , extracted DNA and / or RNA from a first biological solid - phase extraction methods. In some embodiments, a sample may be combined with extracted DNA and / or RNA solution phase extraction method comprises an organic from a second biological sample . extraction method , e.g. , a phenol chloroform extraction [ 0164 ] Systems and methods described herein encompass method . In some embodiments, a solution phase extraction extracting any type of DNA and / or RNA from a biological method comprises a high salt concentration extraction sample . In some embodiments, extracting DNA comprises method , e.g. , guanidinium thiocyantate (GuTC ) or guani extracting genomic DNA ( gDNA ). In some embodiments, dinium chloride (GuCl ) extraction method . In some embodi extracting DNA comprises extracting mitochondrial DNA . ments, a solution phase extraction method comprises an In some embodiments, extracting RNA comprises extracting ethanol precipitation method . In some embodiments, a solu messenger RNA (mRNA ). In some embodiments, extracting tion phase extraction method comprises an isopropanol RNA comprises extracting precursor mRNA ( pre -mRNA ). precipitation method . In some embodiments , a solution In some embodiments, extracting RNA comprises extracting phase extraction method comprises an ethidium bromide ribosomal RNA ( rRNA ). In some embodiments, extracting ( EtBr ) -Cesium Chloride ( CsCl ) gradient centrifugation RNA comprises extracting transfer RNA ( TRNA ). method . In some embodiments, extracting DNA and / or RNA [ 0165 ] In some embodiments, a single kit is used to purity comprises a nonionic detergent extraction method , e.g. , a DNA and RNA from the same sample. A non - limiting cetyltrimethylammonium bromide ( CTAB ) extraction example of kit for doing so is the Qiagen AllPrep DNA /RNA method . kit . In some embodiments , robotics is employed to carry out [ 0159 ] In some embodiments, extracting DNA and / or DNA and / or RNA extraction . RNA comprises a solid phase extraction method . Any solid [ 0166 ] In some embodiments, if a sample of extracted phase that binds to DNA and / or RNA may be used for RNA is not of sufficient yield and / or quality, anyone of the extracting DNA and / or RNA in methods and systems following outcome may occur . First , there may be an over described herein . Examples of solid phases that bind DNA representation of common transcripts in RNA sequencing and / or RNA include , but are not limited to , silica matrices, data , and underrepresentation of low abundance transcripts. ion exchange matrices, glass particles , magnetizable cellu Second , poor quality RNA can lead to insufficient read lose beads , polyamide matrices , and nitrocellulose mem lengths ( i.e. , reads are shorter ) and / or inadequate read qual branes . ity leading to potential misidentification of RNA . [ 0160 ] In some embodiments , a solid phase extraction [ 0167 ] For whole exome sequencing, poor quantity and method comprises a spin - column based extraction method . quality of DNA can lead to misidentification of base pairs In some embodiments, a solid phase extraction method leading to false variant discovery ( e.g. , a false positive ) or comprises a bead - based extraction method . In some embodi incidences where variants are not identified ( e.g. , a false ments , a solid phase extraction method comprises a cation negative ) . Another problem that can arise resulting from low exchange resin , e.g. , a styrene divinylbenzene copolymer DNA quantity and / or quality is inadequate coverage of the resin . exome ( e.g. , missing sequences ). [ 0161 ] Systems and methods described herein encompass [ 0168 ] In some embodiments, before extracted RNA and / extracting DNA and / or RNA from a single biological sample or DNA is processed further for RNA sequencing or whole or a plurality of biological samples. In some embodiments , exome sequencing ( WES ) , the quality and / or quantity of extracting DNA comprises extracting DNA from a single RNA or DNA is checked . In some embodiments, a sample sample . In some embodiments, extracting DNA comprises of extracted RNA is at least 1000-6000 ng in total mass . In extracting DNA from a plurality of samples. In some some embodiments, a sample of extracted RNA is at least embodiments, extracting DNA comprises extracting DNA 100-60000 ng ( e.g. , 100-60000 ng , 500-30000 ng , 800 from a first sample and a second sample . In some embodi 20000 ng , 1000-15000 ng , 1000-10000 ng , 1000-8000 ng , ments , extracting DNA comprises extracting DNA from one 1000-6000 ng , 10000-20000 ng , 20000-60000 ng ) in total or more , two or more , three or more , four or more , five or mass . In some embodiments, the acceptable total RNA more , six or more , seven or more , eight or more , nine or amount for further sequencing is at least 100-1,000 ng ( e.g. , more , or ten or more samples. 100-1,000 ng , 500-1,000 ng , or 300-900 ng ) . In some [ 0162 ] In some embodiments, extracting RNA comprises embodiments, the target total RNA amount for further extracting RNA from a single sample . In some embodi sequencing is more than 200-1,000 ng ( e.g. , 200-1,000 ng , ments , extracting RNA comprises extracting RNA from a 500-1,000 ng , or 300-1,000 ng ). In some embodiments , the plurality of samples. In some embodiments, extracting RNA purity of a sample of extracted RNA is such that it corre comprises extracting RNA from a first sample and a second sponds to a ratio of absorbance at 260 nm to absorbance at sample . In some embodiments, extracting RNA comprises 280 nm of at least 1 ( e.g. , at least 1 , at least 1.2 , at least 1.4 , extracting RNA from one or more , two or more , three or at least 1.6 , at least 1.8 , or at least 2 ) . In some embodiments, more , four or more , five or more , six or more , seven or more , the purity of a sample of extracted RNA is such that it eight or more , nine or more , or ten or more samples. corresponds to a ratio of absorbance at 260 nm to absorbance [ 0163 ] Extracted DNA and / or RNA from a biological at 280 nm of at least 2. The ratio of absorbance at 260 nm sample may be combined with extracted DNA and /or RNA and 280 nm is used to assess the purity of DNA and RNA . from another biological sample. This may be accomplished A ratio of -1.8 is generally accepted as “ pure ” for DNA ; a by combining one or more biological samples and extracting ratio of -2.0 is generally accepted as " pure” for RNA . If the US 2021/0005284 A1 Jan. 7 , 2021 16 ratio is appreciably lower in either case , it may indicate the [ 0172 ] In some embodiments , a sample of extracted DNA presence of protein , phenol or other contaminants that has a target concentration of at least 4 ng / ul ( e.g. , 4 ng /ul , 6 absorb strongly at or near 280 nm . Absorbances can be ng / ul, 8 ng / ul ) . In some embodiments , a sample of extracted measured using a spectrophotometer. DNA has an acceptable concentration of at least 2.5 ng / ul ( e.g. , 2.5 ng / ul , 4.5 ng / ul , 5.5 ng / ul ) . In some embodiments, [ 0169 ] In some embodiments, the purity or integrity of the concentration of the extracted DNA is performed by extracted RNA or DNA ( e.g. , a DNA fragment library ) by Tapestation any one of the methods described herein is such that it corresponds to a RNA integrity number ( RIN ) of at least 4 [ 0173 ] In some embodiments , a sample of extracted RNA ( e.g. , at least 4 , at least 5 , at least 6 , at least 7 , at least 8 , or has a target concentration of at least 2 ng / ul ( e.g. , 2 ng / ul, 4 at least 9 ) . In some embodiments, the purity of extracted ng / ul, 6 ng / ul ) . In some embodiments , a sample of extracted nucleic acid ( e.g. , RNA or DNA ) by any one of the methods RNA has an acceptable concentration of at least 4 ng / ul ( e.g. , described herein is such that it corresponds to a RNA 4 ng / ul , 6 ng / ul , 10 ng / ul ) . In some embodiments, the integrity number ( RIN ) of at least 7. RIN has been demon concentration of the extracted DNA is performed by a strated to be robust and reproducible in studies comparing it fluorometer, for example for quantification of DNA or RNA to other RNA integrity calculation algorithms, cementing its ( e.g. , a Qubit fluorometer available from ThermoFisher position as a preferred method of determining the quality of Scientific, www.thermofisher.com ). RNA to be analyzed ( Imbeaud et al . , Towards standardiza [ 0174 ] In some embodiments, a sample of extracted RNA tion of RNA quality assessment using user - independent has a target concentration of at least 4 ng / ul ( e.g. , 4 ng /ul , 6 classifiers of microcapillary electrophoresis traces ; Nucleic ng / ul, 8 ng / ul ) . In some embodiments, a sample of extracted Acids Research . 33 ( 6 ) : e56 ) . RNA has an acceptable concentration of at least 1.5 ng / ul [ 0170 ] In some embodiments , a sample of extracted DNA ( e.g. , 1.5 ng / ul , 3.5 ng / ul , 5.5 ng / ul ) . In some embodiments , is at least 100-20000 ng ( e.g. , 100-20000 ng , 500-15000 ng , the concentration of the extracted RNA is performed by 800-10000 ng , 1000-15000 ng , 1000-10000 ng , 1000-8000 Tapestation . In some embodiments, the acceptable RNA ng , 1000-6000 ng , or 1000-2000 ng ) in total mass . In some integrity number ( RIN ) is at least 5 ( e.g. , 5 , 6 , 7 ) . In some embodiments , a sample of extracted DNA is at least 1000 embodiments, the target RNA integrity number ( RIN ) is at 2000 ng in total mass . In some embodiments, the acceptable least 8 ( e.g. , 8 , 9 , 10 ) . In some embodiments, the RIN is total DNA amount for further sequencing is at least 20-200 performed by Tapestation . ng ( e.g. , 20-200 ng , 30-200 ng , or 50-150 ng ) . In some [ 0175 ] In some embodiments , the target purity of a sample embodiments, the target total DNA amount for further of extracted RNA is such that it corresponds to a range of a sequencing is more than 30-200 ng ( e.g. , 30-200 ng , 50-200 ratio of absorbance at 260 nm to absorbance at 280 nm of at ng , or 100-200 ng ). In some embodiments, the target purity least 1.8-2 ( e.g. , at least 1.8-2 , at least 1.8-1.9 ) . In some of a sample of extracted DNA is such that it corresponds to embodiments , the purity of a sample of extracted RNA is a range of a ratio of absorbance at 260 nm to absorbance at such that it corresponds to a ratio of absorbance at 260 nm 280 nm of at least 1.8-2 ( e.g. , at least 1.8-2 , at least 1.8-1.9 ) . to absorbance at 280 nm of at least 1.8 . In some embodi In some embodiments, the purity of a sample of extracted ments, the acceptable purity of a sample of extracted RNA DNA is such that it corresponds to a ratio of absorbance at is such that it corresponds to a ratio of absorbance at 260 nm 260 nm to absorbance at 280 nm of at least 1 ( e.g. , at least to absorbance at 280 nm of at least 1.5 ( e.g. , at least 1.5 , at 1 , at least 1.2 , at least 1.4 , at least 1.6 , at least 1.8 , or at least least 1.7 , at least 2 ) . In some embodiments, the target purity 2 ) . In some embodiments, the acceptable purity of a sample of a sample of extracted RNA is such that it corresponds to of extracted DNA is such that it corresponds to a ratio of a range of a ratio of absorbance at 260 nm to absorbance at absorbance at 260 nm to absorbance at 280 nm of at least 1.5 230 nm of at least 2-2.2 ( e.g. , at least 2-2.2 , at least 2-2.1 ) . ( e.g. , at least 1.5 , at least 1.7 , at least 2 ) . In some embodi In some embodiments , the acceptable purity of a sample of ments , the target purity of a sample of extracted DNA is such extracted RNA is such that it corresponds to a ratio of that it corresponds to a range of a ratio of absorbance at 260 absorbance at 260 nm to absorbance at 230 nm of at least 1.5 nm to absorbance at 230 nm of at least 2-2.2 ( e.g. , at least ( e.g. , at least 1.5 , at least 1.7 , at least 2 ) . In some embodi 2-2.2 , at least 2-2.1 ) . In some embodiments , the acceptable ments , the purity of a sample of extracted RNA as described purity of a sample of extracted DNA is such that it corre herein is analyzed by a spectrophotometer, for example a sponds to a ratio of absorbance at 260 nm to absorbance at small volume full- spectrum , UV - visible spectrophotometer 230 nm of at least 1.5 ( e.g. , at least 1.5 , at least 1.7 , at least ( e.g., Nanodrop spectrophotometer available from Ther 2 ) . In some embodiments, the purity of a sample of extracted moFisher Scientific , www.thermofisher.com ). In some DNA as described herein is analyzed by a spectrophotom embodiments, the concentration of extracted DNA is at least eter, for example a small volume full - spectrum , UV - visible 10-2000 ng / ul ( e.g. , 10-2000 ng /ul , 10-1000 ng / ul, 10-200 spectrophotometer ( e.g. , Nanodrop spectrophotometer avail ng / ul, 1-200 ng / ul, 0.5-400 ng / ul , 0.5-200 ng / ul , 100-200 able from ThermoFisher Scientific , www.thermofisher.com ). ng/ ul, 100-400 ng / ul, 100-500 ng /ul , 50-500 ng /ul , or 50-250 [ 0171 ] In some embodiments, a sample of extracted DNA ng /ul ). has a target concentration of at least 4.5 ng /ul ( e.g. , 4.5 ng /ul , [ 0176 ] Protocols for quality control of a sample of 5.5 ng / ul, 6.5 ng / ul ) . In some embodiments , a sample of extracted RNA or DNA can be found at least in Example 6 . extracted DNA has an acceptable concentration of at least 3 In some embodiments, the purity of a sample of extracted ng / ul ( e.g. , 3 ng / ul, 5 ng / ul, 10 ng / ul ). In some embodiments , DNA and / or RNA as described herein can be analyzed by the concentration of the extracted DNA is performed by a any other suitable technologies or tools . In some embodi fluorometer, for example for quantification of DNA or RNA ments , a sample of extracted RNA or DNA is not processed ( e.g. , a Qubit fluorometer available from ThermoFisher further if it does not meet a particular quantity or purity Scientific , www.thermofisher.com ). standard as described above . In some embodiments , if a US 2021/0005284 A1 Jan. 7 , 2021 17 sample of extracted RNA or DNA does not meet a particular interest. Other platforms or tools suitable for targeted quantity or purity standard , it is combined with another mRNA enrichment can also be used . sample . [ 0181 ] Examples of non - targeted mRNA enrichment methods include polyA capture using oligodT ( e.g. , those on Library Preparation for RNA Sequencing conjugated to beads ) , and rRNA depletion. Petrova et al . [ 0177 ] Methods of preparing cDNA libraries from a ( Scientific Reports volume 7 , Article number : 41114 ( 2017 ) sample of RNA are known in the art . For example , www . provides a comparison of various rRNA depletion methods illumina.com/content/dam/illumina-marketing/documents/ and is incorporated by reference herein in its entirety. In applications / ngs - library -prep / for- all - you - seq - rna.pdf pro some embodiments , rRNA depletion can be performed using vides illustrations of different methods for preparing cDNA enzymatic approaches ( e.g. , using an exonuclease that does libraries for RNA sequencing. Non - limiting examples of not process mRNA ). In some embodiments, rRNA depletion cDNA library preparation include ClickSeq , 3Seq , and cP method comprises subtractive hybridization , whereby rRNA RNA -Seq . In some embodiments, preparing a cDNA library is captured using sequence specific probes ( see e.g. , WWW . from RNA comprises purifying mRNA from the sample of sciencedirect.com/topics/immunology-and-microbiology/ RNA (RNA enrichment) . In some embodiments, enriched subtractive - hybridization ) RNA is fragmented . In some embodiments, after selection of [ 0182 ] In some embodiments, polyA capture comprises the appropriate RNA fraction is completed , the molecules capturing of mRNA bearing a polyA tail by using polyA are fragmented into smaller pieces , to a size between specific capture probes ( oligodT ) . In some embodiments, 50-1000 bp ( e.g. , 50-100 bp , 100-800 bp , 100-500 bp , or capture probes are immobilized for ease of purification . In 200-500 bp ) depending on the sequencing platform being some embodiments, capture probes are immobilized on used . This fragmentation can be achieved either by frag beads ( e.g. , magnetic beads ) . In some embodiments, a menting double - stranded ( ds ) cDNA or by fragmenting commercial kit is used to prepare a DNA library from an RNA . Both methods result in the same end product of a RNA sample . In some embodiments, an Illumina TruSeq double stranded cDNA library in which each fragment has RNA Library Prep kit is used . an adapter attached . [ 0183 ] The choice of mRNA enrichment can have a huge [ 0178 ] In some embodiments, a library preparation impact on the selection of transcripts that are sequenced . For method comprises one or more amplification steps to add example , in some embodiments, compared to rRNA deple function elements ( e.g. , sample indices, molecular barcodes tion methods, cDNA libraries prepared using polyA enrich or flow cell oligo binding sites ) , enrich for sequencing ment result in libraries comprising a higher fraction of competent DNA fragments, and / or rate a sufficient protein - coding transcripts ( e.g. , greater than 80 % , greater amount of library DNA for downstream processing. In some than 90 % , greater than 95 % , greater than 96 % , greater than embodiments, enriched RNA ( e.g. , fragmented enriched 97 % , greater than 98 % , greater than 99 % , or greater than RNA ) is amplified using random primers ( e.g., random 99.9 % ) compared to non - coding transcripts ( e.g. , rRNA , hexamers ). In some embodiments, enriched RNA ( e.g. , miRNA , and IncRNA ). fragmented enriched RNA ) is amplified using oligodTs . In [ 0184 ] In some embodiments, prepared cDNA libraries are some embodiments, RNA is then removed from the formed tested for quality. In some embodiments, quantification of cDNA . In some embodiments , cDNA is amplified to include libraries for use in sequencing is generally performed before sequencing adapters and indices ( i.e. , a plurality of indexes ). the libraries are pooled for target enrichment or amplifica An adapter is a DNA sequence of 10-100 bp ( e.g. , 10-20 , tion to ensure equal representation of indexed libraries in 10- 0 , 20-80 , 30-70 , 50 , 20-100 , 40-100 , 40-80 , 30-60 , multiplexed applications. In some embodiments, quantifica or 45-65 bp ) that can bind to a flow cell for sequencing . tion is also used to confirm that individual libraries or library Adapters also allow for PCR enrichment of adapter -ligated pools are diluted optimally prior to sequencing. Accurate DNA fragments. Adapters also can allow for indexing or and reproducible quantification of adapter - ligated library barcoding of samples so that multiple cDNA libraries can be molecules contributes to obtaining consistent and reproduc mixed together into one sequencing sample ( or lane ) ; i.e. , it ible results, and for maximizing sequencing yields . Loading allows for multiplexing. In some embodiments , an index or more than the recommended amount of DNA could lead to a barcode is 4-20 bp long ( e.g. , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , saturation of the flowcell or increased cluster density while 13 , 14 , 15 , 16 , 17 , 18 , 19 , 4-20,5-15 , 6-12 , or 4-12 bp long ) . loading too little DNA can lead to decreased cluster density tucf-genomics.tufts.edu/documents/protocols/TUCF_Un and reduced coverage and depth . derstanding_Illumina_TruSeq_Adapters.pdf provides [ 0185 ] Methods of quantifying DNA libraries include example protocols for preparing cDNA libraries using adapt electrophoresis, fluorometry, spectrophotometry, digital ers and indexing and is incorporated herein by reference in PCR , droplet -digital PCR and qPCR . Various instruments its entirety . Protocols for constructing a DNA or RNA library for measuring the quantity and / or quality of DNA libraries can at least be found in Example 3 and Example 5 . exist , e.g. , the Agilent High Sensitivity D1000 ScreenTape [ 0179 ] RNA Enrichment System . [ 0180 ] Methods for RNA enrichment to enrich for mRNA [ 0186 ] Aspects of the present disclosure provide quality ( herein also described as “ RNA enrichment” ) during cDNA control of nucleic acids for the sequencing analysis. Aspects library preparation are known in the art . RNA enrichment of the present disclosure provide quality control of DNA for can be targeted or non - targeting. Targeted methods of RNA the sequencing analysis. Aspects of the present disclosure enrichment include use of sequence - specific capture probes. provide quality control of RNA for the sequencing analysis . A non - limiting example of targeted mRNA enrichment In some embodiments, the nucleic acids can include any includes CaptureSeq (sapac.illumina.com/science/sequenc suitable types of DNA or RNA . In some embodiments, the ing -method - explorer / kits - and -arrays / captureseq.html), quality control of nucleic acid comprises the confirmation of which makes use of capture probes specific to sequences of biopsy condition and documents . In some embodiments , the US 2021/0005284 A1 Jan. 7 , 2021 18 confirmation of biopsy condition and documents can RNA extraction . In some embodiments, any suitable tech include , but are not limited to the inventory and registration nology or tool can be used for determining the quality of the of the nucleic acid materials . In some embodiments , the DNA or RNA extraction . confirmation of biopsy condition and documents include [ 0189 ] In some embodiments, the acceptable total DNA nucleic acid material acceptance . By way of example , amount for further DNA library construction is at least patient samples received from a healthcare provider are 200-1,000 ng ( e.g. , 200-1,000 ng , 300-1,000 ng , or 300-1 , confirmed whether the patient tissues are in the condition of 000 ng ) . In some embodiments, the target total DNA amount fresh frozen or formalin - fixed paraffin - embedded . The labo for further sequencing is more than 500-1,000 ng ( e.g. , ratory personnel verify the compliance of the biopsy of the 500-1,000 ng , 600-1,000 ng , or 800-1,000 ng ) . In some registered entity. The laboratory personnel verify the proper embodiments, the acceptable total RNA amount for further storage of the biopsy sample during transportation . The RNA library construction is at least 0.5-4 nmol/ l ( e.g. , laboratory personnel verify the physical condition of the 200-1,000 ng , 300-1,000 ng , or 300-1,000 ng ) . In some biopsy samples. In the event that the laboratory personnel embodiments , the target total RNA amount for further RNA identify any errors regarding the biopsy samples, the source library construction is at least 0.5-4 nmol/ l ( e.g. , 500-1,000 of the biopsy samples ( e.g. , a healthcare provider) may be ng , 600-1,000 ng , or 800-1,000 ng ) . notified . In some embodiments, if the received biopsy [ 0190 ] In some embodiments, the acceptable DNA con samples are patient tissue cell lines , the samples are prepared centration for further DNA library construction is at least 17 for extraction . In some embodiments , if the received biopsy ng / ul ( e.g. , 17 ng / ul , 25 ng / ul , 35 ng / ul ) . In some embodi samples are extracted DNA or RNA , the samples are stored ments, the target DNA concentration for further DNA library at -80 ° C. for further sequencing. In some embodiments, the construction is at least 42 ng / ul ( e.g. , 42 ng / ul, 50 ng / ul, 80 extracted DNA can be a reference gDNA . In some embodi ng/ ul ). In some embodiments, the acceptable RNA concen ments, the extracted RNA can be a reference RNA . tration for further RNA library construction is at least 0.1 [ 0187 ] In some embodiments , the quality control proce ng / ul ( e.g. , 0.1 ng / ul, 1 ng /ul , 3 ng /ul ) . In some embodi dures provide a target range . The target range may represent ments, the target RNA concentration for further RNA library the most ideal quality of a given step ( e.g. , extraction ). In construction is at least 0.1 ng / ul ( e.g. , 0.1 ng / ul, 1 ng / ul, 3 some embodiments, the quality control procedures provide ng/ ul ). In some embodiments , the DNA and RNA concen an acceptable range . The acceptable range may represent tration is detected by a fluorometer, for example for quan ideal or acceptable quality of a given step . In some embodi tification of DNA or RNA ( e.g. , a Qubit fluorometer avail ments, the quality control of nucleic acid comprises ensuring able from ThermoFisher Scientific , www.thermofisher.com ). the quality in the process of constructing a DNA library. In [ 0191 ] In some embodiments, the acceptable DNA con some embodiments , the quality control of nucleic acid centration for further DNA library construction is at least 15 comprises ensuring the quality in the process of constructing ng / ul ( e.g. , 15 ng / ul , 25 ng / ul , 35 ng / ul ) . In some embodi an RNA library. As shown in FIG . 7 and Example 6 , the ments, the target DNA concentration for further DNA library preparation of the DNA or RNA libraries comprises the construction is at least 402 ng /ul ( e.g. , 40 ng /ul , 50 ng / ul , 80 extraction of the DNA or RNA from the patient tissue ng / ul ). In some embodiments, the acceptable RNA concen samples . In some embodiments , a spectrophotometer, fo tration for further RNA library construction is at least 0.1 example a small volume full - spectrum , UV - visible spectro ng / ul ( e.g. , 0.1 ng / ul, 1 ng / ul, 3 ng / ul) . In some embodi photometer ( e.g. , Nanodrop spectrophotometer available ments, the target RNA concentration for further RNA library from ThermoFisher Scientific , www.thermofisher.com ) can construction is at least 0.1 ng / ul ( e.g. , 0.1 ng / ul, 1 ng /ul , 3 be used for determining the quality of the DNA or RNA ng / ul ). In some embodiments, the acceptable RNA concen extraction . By way of example, the extracted DNA at > 100 tration for further RNA library construction is at least 0.5 ng / ul shows that the extracted DNA passes the quality nmol /l ( e.g. , 0.5 nmol/ l, 1 nmol /l , 5 nmol/ l) . In some control test . The extracted RNA at > 500 ng / ul shows that the embodiments, the target RNA concentration for further extracted RNA passes the quality control test . In another RNA library construction is at least 0.5 nmol / l ( e.g. , 0.5 example , the ratio of absorbance at 260 nm and 280 nm nmol/ 1, 1 nmol/ 1, 5 nmol / l) . In some embodiments, the DNA ( 260/280 ) of the extracted DNA at 1.8-2.0 shows that the and RNA concentrations are detected by Tapestation . extracted DNA passes the quality control test . The ratio of [ 0192 ] In some embodiments , the acceptable RNA con absorbance at 260 nm and 280 nm ( 260/280 ) of the extracted centration for further RNA library construction is at least 0.5 RNA at 2.0 shows that the extracted RNA passes the quality nmol /l ( e.g. , 0.5 nmol/ l, 1 nmol /l , 5 nmol/ l) . In some control test . In another example, the ratio of absorbance at embodiments, the target RNA concentration for further 260 nm and 230 nm ( 260/230 ) of the extracted DNA at RNA library construction is at least 0.5 nmol /l ( e.g. , 0.5 2.0-2.2 shows that the extracted DNA passes the quality nmol/ 1, 1 nmol/ 1, 5 nmol /l ). In some embodiments, the DNA control test . The ratio of absorbance at 260 nm and 230 nm and RNA concentration is detected by a nucleic acid ampli ( 260/230 ) of the extracted RNA at 2.0-2.2 shows that the fication device ( e.g. , a PCR system ) , for example a real - time extracted RNA passes the quality control test . In some PCR system ( e.g. , a LightCycler Instrument available from embodiments, a fluorometer, for example for quantification Roche , www.lifescience.roche.com ). In some embodiments , of DNA or RNA ( e.g. , a Qubit fluorometer available from the DNA and RNA concentration can be detected by any ThermoFisher Scientific , www.thermofisher.com ) can be suitable technologies or tools . used for determining the quality of the DNA or RNA [ 0193 ] In some embodiments, if RNA is extracted , reverse extraction . can be performed . In some embodiments , an [ 0188 ] In some embodiments, an electrophoresis device , RNA library can be constructed after the reverse transcrip for example an automated electrophoresis device ( e.g. , a tion is performed . In some embodiments , a fluorometer, for TapeStation System available from Agilent, www.agilent. example for quantification of DNA or RNA ( e.g. , a Qubit com ) can be used for determining the quality of the DNA or fluorometer available from ThermoFisher Scientific , www . US 2021/0005284 A1 Jan. 7 , 2021 19 thermofisher.com ) can be used for determining the quality of pooling is at least 0.1 ng /ul ( e.g. , 0.1 ng / ul, 0.8 ng / ul, 4 the DNA or RNA libraries . In some embodiments, any ng / ul ) when an electrophoresis device, for example an suitable method can be used for determining the quality of automated electrophoresis device ( e.g. , a TapeStation Sys the DNA or RNA libraries. In some embodiments, an tem available from Agilent, www.agilent.com ) is used . In electrophoresis device , for example an automated electro some embodiments , the acceptable DNA concentration for phoresis device ( e.g. , a TapeStation System available from pooling is at least 0.5 nmol/ l ( e.g. , 0.5 nmol / 1, 0.8 nmol/ 1, 3 Agilent, www.agilent.com ) can be used for determining the nmol/ l) when an electrophoresis device , for example an quality of the DNA or RNA libraries . In some embodiments, automated electrophoresis device ( e.g. , a TapeStation Sys a fluorometer, for example for quantification of DNA or tem available from Agilent, www.agilent.com ) is used . In RNA ( e.g. , a Qubit fluorometer available from Ther some embodiments, the target DNA concentration for pool moFisher Scientific , www.thermofisher.com ) can be used for ing is at least 0.5 nmol /l ( e.g. , 0.5 nmol/ 1, 0.8 nmol/ 1, 3 determining the quality of the DNA or RNA extraction . In nmol /l ) when an electrophoresis device , for example an some embodiments , a nucleic acid amplification device automated electrophoresis device ( e.g. , a TapeStation Sys ( e.g. , a PCR system ) , for example a real - time PCR system tem available from Agilent, www.agilent.com ) is used . In ( e.g. , a LightCycler Instrument available from Roche , www . some embodiments , the acceptable and / or concentration of lifescience.roche.com ) can be used for determining the DNA is in the range of 380-440 ng ( e.g. , 380-440 ng , quality of the RNA library. In some embodiments, one or 400-440 ng , 420-440 ng ) when an electrophoresis device , more RNA libraries can be pooled . In some embodiments, if for example an automated electrophoresis device ( e.g. , a DNA is extracted , the extracted DNA can be used for a DNA TapeStation System available from Agilent, www.agilent. library construction . In some embodiments, the DNA frag com ) is used . In some embodiments , the acceptable DNA ments in the constructed DNA library can be hybridized concentration for pooling is at least 0.5 nmol / l ( e.g. , 0.5 and / or captured . In some embodiments, a fluorometer, for nmol/ 1, 0.8 nmol / 1, 3 nmol / l) when a nucleic acid amplifi example for quantification of DNA or RNA ( e.g. , a Qubit cation device ( e.g. , a PCR system ) , for example a real - time fluorometer available from ThermoFisher Scientific , www . PCR system ( e.g. , a LightCycler Instrument available from thermofisher.com ) can be used for determining the quality of Roche , www.lifescience.roche.com ) is used . In some the DNA hybridization and capture step . In some embodi embodiments , the target DNA concentration for pooling is at ments, an electrophoresis device , for example an automated least 0.5 nmol /l ( e.g. , 0.5 nmol/ 1, 0.8 nmol/ 1 , 3 nmol/ l) when electrophoresis device ( e.g. , a TapeStation System available LightCycler is used . from Agilent, www.agilent.com ) can be used for determin [ 0196 ] In some embodiments , the quality control of ing the quality of the DNA hybridization and capture step . nucleic acid comprises ensuring the quality after DNA or In some embodiments, a nucleic acid amplification device RNA library construction such as during the sequencing ( e.g. , a PCR system ) , for example a real - time PCR system process. In some embodiments, cluster density can be a ( e.g. , a LightCycler Instrument available from Roche , www . parameter for quality control of the sample run ( Example 6 ) . lifescience.roche.com ) can be used for determining the Cluster density is an important factor in optimizing data quality of the DNA hybridization and capture step . In some quality and yield of the sequencing. Without wishing to be embodiments , any suitable method can be used for deter bound by any theory, an optimal cluster density at least mining the quality of the DNA hybridization and capture shows the DNA or RNA libraries are balanced . In some step . In some embodiments, one or more DNA libraries can embodiments , quality score and signal/ noise ratio can be be pooled . In some embodiments, an electrophoresis device , parameters for quality control of the sample run . for example an automated electrophoresis device ( e.g. , a [ 0197 ] In some embodiments, the quality control of TapeStation System available from Agilent, www.agilent. nucleic acid comprises ensuring the quality of sequencing . com ) can be used for determining the quality of the DNA or In some embodiments, the quality control of sequencing RNA library pooling . In some embodiments, any suitable comprises bioinformatics quality control. In some embodi methods can be used for determining the quality of the DNA ments , the sequencing can be a DNA sequencing. In some or RNA library pooling . embodiments, the sequencing can be RNA sequencing . In [ 0194 ] In some embodiments, the acceptable and / or target some embodiments, the sequencing can be any type of final DNA concentration range for pooling is at least 0.5-4 sequencing technologies known in the art for determining nmol/ l ( e.g. , 0.5-4 nmol/ 1, 0.5-3 nmol/ 1, 2-4 nmol /l ) . In some the DNA or RNA expression profiles of a given biological embodiments, the acceptable DNA concentration for pool sample . By way of example , the sequencing can be a ing is at least 0.1 ng / ul ( e.g. , 0.1 ng / ul, 0.8 ng / ul, 4 ng / ul) whole -exome sequencing. The sequencing can be a tran when a fluorometer, for example for quantification of DNA scriptome sequencing. The sequencing can be a Sanger or RNA ( e.g. , a Qubit fluorometer available from Ther sequencing. moFisher Scientific, www.thermofisher.com ) is used . In [ 0198 ] In some embodiments, up to 1 ng ( e.g. , up to 0.1 , some embodiments , the target DNA concentration for pool up to 0.2 , up to 0.3 , up to 0.4 , up to 0.5 , up to 0.6 , up to 0.7 , ing is at least 0.1 ng / ul ( e.g. , 0.1 ng / ul, 0.8 ng / ul, 4 ng / ul) up to 0.8 , up to 0.9 , or up to 1 ng ) of library in up to 2 ul ( e.g. , when a fluorometer, for example for quantification of DNA up to 0.1 ul, up to 0.5 ul , up to 0.8 ul, up to 0.9 ul , up to 1 or RNA ( e.g. , a Qubit fluorometer available from Ther ul , up to 1.2 ul, up to 1.4 ul , up to 1.5 ul , up to 1.8 ul , or up moFisher Scientific , www.thermofisher.com ) is used . to 2 ul ) of solution is used for quality control testing . In some [ 0195 ] In some embodiments , the acceptable DNA con embodiments, the parameters that are tested include sizes centration for pooling is at least 0.1 ng / ul ( e.g. , 0.1 ng / ul, 0.8 and size distribution of DNA molecules , and purity . ng / ul , 4 ng / ul) when an electrophoresis device, for example [ 0199 ] In some embodiments, a standard method of pre an automated electrophoresis device ( e.g. , a TapeStation paring a library of cDNA fragments from RNA fails to System available from Agilent, www.agilent.com ) is used . preserve the information pertaining to which DNA strand In some embodiments, the target DNA concentration for was the original template during transcription and subse US 2021/0005284 A1 Jan. 7 , 2021 20 quent synthesis of the mRNA transcript. Since antisense Library Preparation of WES transcripts are likely to have regulatory roles that are dis [ 0205 ] An “ exome ” is the sum of all regions in the genome tinctly different from their protein coding complement, this comprised of exons . Exons are DNA regions that are tran loss of strand information results in an incomplete under scribed into messenger RNA , as opposed to introns which standing of the transcriptome. Strand - specific RNA - Seq can are removed by splicing proteins . Exome sequencing is a be performed to preserve this strandedness . Methods of capture -based method developed to identify variants in the preserving strandedness and preparing cDNA fragment coding region of genes that affect protein function . As the libraries for it are known in the art ( see e.g. , Mills et al . coding portion of the genome encompasses only 1-2 % of the Strand -Specific RNA -Seq Provides Greater Resolution of entire genome, this approach represents a cost - effective Transcriptome Profiling; Curr Genomics. 2013 May ; 14 ( 3 ) : strategy to detect DNA alterations that may alter protein 173-181 ) . In some embodiments , library preparation for function , compared to whole genome sequencing . In some stranded RNA - seq makes use of known orientation strand embodiments, whole exome sequencing ( WES ) comprises specific adapters . In some embodiments, strands are chemi preparation of a library of DNA fragments for sequencing cally modified to preserve knowledge of their origin . from a sample of DNA . In some embodiments, DNA is first [ 0200 ] In some embodiments , methods making use of fragmented to the appropriate size ( depending on the adapters include strand -specific 3 ' - end RNA - Seq . In some sequencing platform used ) and then sequencing platform embodiments, strand - specific 3 '- end RNA -Seq comprises specific adapters are added . In some embodiments, libraries anchored oligo ( dT ) primers are first used to select for are amplified before the next step in the process ( target mRNA , which results in production of double - stranded enrichment or sequencing ). cDNA molecules . Adapters for paired -end sequencing are [ 0206 ] Kits are commercially available for the preparation then ligated to each end of the cDNA molecule . Subse of libraries, non - limiting examples of which include KAPA quently, the fragments are sequenced generating pair - end HyperPrep Kits , Agilent HaloPlex , Agilent SureSelect QXT, reads that are aligned to a reference genome . Any aligned IDT XGEN Exome, Illumina Nextera Rapid Capture Exome , read that contains a stretch of adenines at the end of the Roche Nimblegen SeqCap , and MYcroarray MYbaits . In transcript must be a transcript that originated from the DNA some embodiments, any kit that can prepare a DNA library antisense strand, while any reads that align with a stretch of for WES can be used . For example , an Agilent Human All thymines at the front must be a transcript from the DNA Exon V6 Capture Kit ( www.agilent.com/cs/library/ sense strand . datasheets / public / SureSelect % 20V6 % 20DataSheet % 205991-5572EN.pdf) is [ 0201 ] In some embodiments, methods making use of used to prepare a DNA library for WES . In some embodi adapters makes use of single - stranded ( ss ) cDNA and Illu ments, a Clinical Research Exome kit ( www.agilent.com/ mina adapters and 4 DNA that allows for linking of 3 ' en / promotions / clinical- research - exome- v2 ) is used . Quanti and 5 ' adapters to ssDNA . As the second strand is never ties of DNA needed depend on the specific reagents used to synthesized and does not proceed to sequencing, strand prepare the library . For example , 100 ng of genomic DNA information is retained . is sufficient for Agilent SureSelect XT2 V6 Exome , but 500 [ 0202 ] In some embodiments, any suitable technologies or ng of genomic DNA is required for IDT XGEN Exome tools can be used for preserving strandedness. For example , Panel . A comparison of various capture kits are provided in Flowcell reverse transcription sequencing ( FRT- Seq ) can be www.genohub.com/exome-sequencing-library-preparation . used to preserve strandedness. In some embodiments , FRT [ 0207 ] In some embodiments, a library preparation Seq or the equivalent technologies comprises ligation of method comprises one or more amplification steps to add adapters to either end of fragmented and purified polyade function elements ( e.g. , sample indices , molecular barcodes nylated mRNA . In some embodiments, each adapter com or flow cell oligo binding sites ) , enrich for sequencing prises two regions ; a region to which the sequencing primers competent DNA fragments, and / or generate a sufficient anneal and a region that is complementary to the oligonucle amount of library DNA for downstream processing. By way otides present on the flowcell. The complementary region of example , a library preparation method is shown in allows the mRNA fragment to hybridize to the flowcell . The Example 3 and Example 5 . mRNA fragments are then reverse transcribed on the flow [ 0208 ] In some embodiments, prepared DNA libraries are cell surface . tested for quality. In some embodiments, quantification of libraries for use in sequencing is generally performed before [ 0203 ] Other non - limiting adapter -based methods of pre the libraries are pooled for target enrichment or amplifica serving strandedness include direct strand -specific sequenc tion to ensure equal representation of indexed libraries in ing ( DSSS ) and SOLID® Total RNA - Seq Kit ( tools.ther multiplexed applications . In some embodiments, quantifica mofisher.com/content/sfs/manuals/cros_078610.pdf) that tion is also used to confirm that individual libraries or library preserves strand specificity through the addition of adapters pools are diluted optimally prior to sequencing. Accurate in a directional manner . and reproducible quantification of adapter - ligated library [ 0204 ] In some embodiments , chemical modification of molecules contributes to obtaining consistent and reproduc strands to preserve knowledge of their origin comprises ible results, and for maximizing sequencing yields . Loading marking the original RNA template through the use of more than the recommended amount of DNA could lead to bisulfite treatment. In some embodiments , dUTPs are incor saturation of the flowcell or increased cluster density while porated into reverse transcription reaction , resulting in ds loading too little DNA can lead to decreased cluster density cDNA where the original strand has deoxythymidine resi and reduced coverage and depth . dues while the complementary strand contains deoxyuridine [ 0209 ] Methods of quantifying DNA libraries include residues . Uracil - DNA - (UDG ) treatment can electrophoresis, fluorometry, spectrophotometry, digital then be used to degrade complementary strands . PCR , droplet -digital PCR and qPCR . Various instruments US 2021/0005284 A1 Jan. 7 , 2021 21 for measuring the quantity and / or quality of DNA libraries embodiments , direct RNA sequencing comprises single mol exist, e.g. , the Agilent High Sensitivity D1000 Screen Tape ecule RNA sequencing (DRSTM ). System . [ 0216 ] In some embodiments , RNA sequencing is mRNA [ 0210 ] In some embodiments, prepared DNA libraries are sequencing. In some embodiments, mRNA sequencing is the tested for quality. In some embodiments up to 1 ng ( e.g. , up sequencing of only coding transcripts with the goal to to 0.1 , up to 0.2 , up to 0.3 , up to 0.4 , up to 0.5 , up to 0.6 , up exclude non - coding regions. In some embodiments, mRNA to 0.7 , up to 0.8 , up to 0.9 , or up to 1 ng ) of library in up to sequencing is independent of polyA enrichment. In some 2 ul ( e.g. , up to 0.1 ul, up to 0.5 ul , up to 0.8 ul , up to 0.9 embodiments, mRNA sequencing depends on polyA enrich ul , up to 1 ul , up to 1.2 ul, up to 1.4 ul , up to 1.5 ul , up to ment . 1.8 ul , or up to 2 ul) of solution is used for quality control [ 0217 ] In some embodiments , RNA is extracted from a testing . In some embodiments, the parameters that are tested biological sample, mRNA is enriched from the extracted include sizes and size distribution of DNA molecules , and RNA , DNA libraries are constructed from the enriched purity. mRNA . In some embodiments , single pieces of cDNA from a cDNA library are attached to a solid matrix . In some RNA Sequencing embodiments, single pieces of cDNA from a cDNA library [ 0211 ] RNA sequencing is a tool to measure the transcrip are attached to a solid matrix by limited dilution . In some tome . The transcriptome is comprised of different popula embodiments, cDNA pieces attached to a matrix are then tions of RNA molecules, including mRNA , rRNA , RNA , sequenced ( e.g. , using Pacbio or Pacifbio technology ). In and other non - coding RNA ( such as microRNA , IncRNA ). some embodiments, cDNA pieces that are attached to a In some embodiments , RNA sequencing is used to profile matrix are amplified and sequenced ( e.g. , using a specialized the transcriptome ( e.g., the coding and /or non -coding emulsion PCR ( emPCR ) in SOLID , 454 Pyrosequencing, regions ) . In some embodiments, it is used to identify genes Ion Torrent, or a connector based on the bridging reaction that are differentially expressed in different biological ( Illumina ) platforms ). samples ( e.g. , cells , tissue , or bodily fluid ). In some embodi [ 0218 ] In some embodiments, cDNA transcripts can be ments , RNA sequencing is used to determine the genetic sequenced in parallel , either by measuring the incorporation effects of splicing events, identify novel transcripts, detect of fluorescent nucleotides ( for example, Illumina ), fluores structural variations ( e.g. , gene fusions and isoforms ), and /or cent short linkers ( for example , SOLID ) , by the release of to detect single nucleotide variants. the by - products derived from the incorporation of normal [ 0212 ] In some embodiments, the term “ RNA sequencing ” nucleotides ( 454 ) , by measuring fluorescence emissions, or can be used interchangeably with “ RNA seq , ” “ RNA -seq . ” by measuring pH change ( for example, Ion Torrent ). In some or the variations thereof as known in the art referring to any embodiments, cDNA transcripts can be sequenced using any technologies, tools , or platforms that interrogate the tran known sequencing platform . Jazayeri et al. ( RNA - seq : a scriptome . It is noted that when “ RNA sequencing , ” “ RNA glance at technologies and methodologies; Acta biol . seq, ” “ RNA - seq , " or the variations thereof is referred in the Colomb. vol . 20 no . 2 Bogotá May / August 2015 ) provides present disclosure, it does not refer to a specific technology a comparison of different RNA - seq platforms, and is incor or tool that is associated with a particular platform or porated herein by reference in its entirety , including Table 3 company, unless indicated otherwise by way of non - limiting and Table 4. Mestan et al . ( Genomic sequencing in clinical examples for demonstrating the processes or systems as trials; Journal of Translational Medicine 2011 , 9 : 222 ) pro described herein . In some embodiments, RNA sequencing vides a similar analysis for sequencing in clinical trials . can be conducted by using any suitable encing plat ( 0219 ] In some embodiments, RNA sequencing is forms and / or sequencing methods. Non - limiting examples stranded or strand - specific . cDNA synthesis from RNA of high - throughput sequencing platforms include mRNA results in loss of strandedness . In some embodiments, seq , total RNA - seq , targeted RNA -seq , single -cell RNA strandedness is preserved by chemically labeling either or Seq , RNA exome capture platform , or small RNA - seq ( e.g. , both the RNA strand and the cDNA strand that is formed by Illumina , www.illumina.com ) , SMRT ( single molecule , real reverse transcription or antisense transcription , or by using time) sequencing ( e.g. , Pacific Biosciences, https : //www . adapter- based techniques to distinguish the original RNA pacb.com ) , and RNA sequencing ( e.g. , ThermoFisher , www . strand from the complementary DNA strand , as described thermofisher.com ). above . [ 0213 ] As described above , RNA sequencing can be tar [ 0220 ] In some embodiments , nonstranded RNA sequenc geted or untargeted . Targeted approaches include using ing is performed . In some embodiments, stranded RNA -seq sequence - specific probes or oligonucleotides to sequence should be avoided for clinical samples. In some embodi one or more specific regions of the transcriptome. In some ments , nonstranded RNA - seq is used to compare data embodiments , targeted RNA sequencing includes methods obtained from a biological sample to RNA sequencing data such as mRNA enrichment ( e.g. , by polyA enrichment or in established data sets ( e.g. , The Cancer Genome Atlas rRNA depletion ) . ( TCGA ) and International Cancer Genome Consortium [ 0214 ] In some embodiments, RNA sequencing is whole ( ICGC ) ) . transcriptome sequencing. Whole transcriptome sequencing [ 0221 ] In some embodiments , RNA sequencing yields comprises measurement of the complete complement of paired - end reads . Paired - end reads are reads of the same transcripts in a sample. In some embodiments, whole tran nucleic acid fragment and are reads that start from either end scriptome sequencing is used to determine global expression of the fragment. In some embodiments , RNA sequencing is levels of each transcript ( e.g. , both coding and non - coding ) , performed with paired - end reads of at least 2x25 ( 2x25 , identify exons , introns and / or their junctions. 2x50 , 2x75 , 2x100 , 2x125 , 2x150 , 2x175 , 2x200 , 2x225 , [ 0215 ] In some embodiments, RNA is sequenced directly 2x250 , 2x275 , 2x300 , 2x325 , or 2x350 ) paired - end reads . In without preparing cDNA from a sample of RNA . In some some embodiments , RNA sequencing is performed with US 2021/0005284 A1 Jan. 7 , 2021 22 paired - end reads of at least 2x75 paired -end reads . RNA [ 0229 ] Any high - throughput DNA sequencing platform sequencing with 2x75 paired - end reads means that on aver and /or method can be used in any one of the methods age each read, which is paired - end , reads 75 base pairs . In described herein . In some embodiments , DNA sequencing some embodiments , RNA sequencing is performed with a can be conducted by using any suitable platforms and / or total of at least 20 million ( e.g. , at least 20 million , at least methods. Non - limiting examples of high - throughput 30 million , at least 40 million , at least 50 million , at least 60 sequencing methods include Single - molecule real- time million , at least 70 million at least 80 million , at least 90 sequencing, Ion semiconductor ( Ion Torrent sequencing ), million , at least 100 million , at least 120 million , at least 140 Pyrosequencing ( i.e. , 454 ) , Sequencing by synthesis ( Illu million , at least 150 million , at least 160 million , at least 180 million , at least 200 million , at least 250 million , at least 300 mina ) , Illumina ( Solexa ) sequencing, Combinatorial probe million , at least 350 million , or at least 400 million ) paired anchor synthesis ( CPAS - BGI /MGI ), Sequencing by ligation end reads . In some embodiments , RNA sequencing is per ( SOLID sequencing ), Nanopore Sequencing ( e.g. , using an formed with a total of at least 50 million paired - end reads . instrument from Oxford Nanopore Technologies, Chain ter In some embodiments , RNA sequencing is performed with mination ( Sanger sequencing ), massively parallel signature a total of at least 100 million paired - end reads . sequencing ( MPSS ) pology sequencing, Heliscope single [ 0222 ] In some embodiments, quality control is performed molecule sequencing , and Single molecule real time for RNA sequencing. In some embodiments, cluster density ( SMRT ) sequencing ( e.g. , using an instrument from Pacific or cluster PF % is a parameter for determining the quality of Biosciences ). Other non - limiting examples of high - through the sample run . In some embodiments, the target range of put sequencing techniques include Tunnelling currents DNA cluster density or cluster PF % is at least 170-220 ( e.g. , sequencing, sequencing by hybridization , Sequencing with 170-220 , 190-220 , 210-220 ) . In some embodiments, the mass spectrometry, Microfluidic Sanger sequencing, and acceptable range of cluster density or cluster PF % is at least RNAP sequencing . 280 ( e.g. , 280 , 300 , 450 ) . [ 0230 ] In some embodiments, DNA sequencing yields [ 0223 ] In some embodiments, % Q30 is a parameter for paired - end reads . Paired - end reads are reads of the same determining the quality of the sample run . In some embodi nucleic acid fragment and are reads that start from either end ments, the target % zQ30 is at least 85 % ( e.g. , 85 % , 90 % , of the fragment. In some embodiments , DNA sequencing is 95 % ) . In some embodiments, the acceptable % 2Q30 is at performed with paired -end reads of at least 2x25 ( 2x25 , least 75 % ( e.g. , 75 % , 85 % , 95 % ) . 2x50 , 2x75 , 2x100 , 2x125 , 2x150 , 2x175 , 2x200 , 2x225 , [ 0224 ] In some embodiments, error rate % is a parameter 2x250 , 2x275 , 2x300 , 2x325 , or 2x350 ) paired - end reads. In for determining the quality of the sample run . In some some embodi DNA sequencing is performed with embodiments , the target error rate % is less than 0.7 % ( e.g. , paired - end reads of at least 2x75 paired - end reads. DNA 0.6 % , 0.5 % , 0.4 % ) . In some embodiments, the acceptable sequencing with 2x75 paired - end reads means that on aver error rate % is less than 1 % ( e.g. , 0.9 % , 0.8 % , 0.7 % ) . age each read , which is paired - end , reads 75 base pairs . In some embodiments, DNA sequencing is performed with a Whole Exome Sequencing ( WES ) total of at least 20 million ( e.g. , at least 20 million , at least 30 million , at least 40 million , at least 50 million , at least 60 [ 0225 ] Whole exome sequencing ( WES ) is a genomic million , at least 70 million at least 80 million , at least 90 technique for sequencing all of the protein - coding region of million , at least 100 million , at least 120 million , at least 140 genes in a genome. In some embodiments, WES is per million , at least 150 million , at least 160 million , at least 180 formed to identify genetic variants that alter protein million , at least 200 million , at least 250 million , at least 300 sequences. In some embodiments, WES is performed to million , at least 350 million , or at least 400 million ) paired identify genetic variants that alter protein sequences at a cost end reads. In some embodiments, DNA sequencing is per that is lower than the cost of whole genome sequencing. formed with a total of at least 50 million paired - end reads. [ 0226 ] In some embodiments, whole exome sequencing In some embodiments, DNA sequencing is performed with ( WES ) is performed on a sample of DNA that has been a total of at least 100 million paired - end reads. In some extracted from a biological sample. In some embodiments , embodiments, DNA sequencing is performed so that at least a library of DNA fragments is prepared from the sample of a 20x ( e.g. , at least a 20x , at least a 30x , at least a 40x , at extracted DNA . In some embodiments , any one of the least a 50x , at least a 60x , at least a 70x , at least a 80x , at methods described herein comprises performing whole least a 90x , at least a 1000x , at least a 120x , at least a 125x , exome sequencing ( WES ) on a library of DNA fragments. at least a 150x , at least a 175x , at least a 200x , at least a Preparation of DNA libraries from a sample of DNA for 250x , at least a 300x , or at least a 400x ) coverage is yielded. WES is described above . Coverage, which also is referred to as depth , is the number [ 0227 ] In some embodiments, libraries of DNA are quan of times , on average , a single in a sample of nucleic tified before sequencing ( e.g. , using next -generation acid is read or sequenced . In some embodiments, the portion sequencing (NGS ) ). In some embodiments , DNA libraries of the genome that is targeted for capture and sequencing is are pooled before sequencing. In some embodiments, DNA at least 10 Mb ( e.g. , at least 10 Mb , at least 20 Mb , at least libraries are amplified before sequencing. In some embodi 30 Mb , at least 40 Mb, at least 50 Mb , at least 60 Mb , at least ments, DNA libraries are indexed before sequencing to keep 70 Mb , at least 80 Mb , at least 90 Mb , at least 100 Mb , at track of the origin of a DNA fragment. least 120 Mb , at least 150 Mb , at least 200 Mb , at least 250 [ 0228 ] In some embodiments, WES comprises target- en Mb, at least 300 Mb, or at least 350 Mb ) . In some embodi richment allowing the selective capture of genomic regions ments , the portion of the genome that is targeted for capture of interest prior to sequencing . In some embodiments, and sequencing is at least 48 Mb ( e.g. , after using the Agilent array - based capture is used ( e.g. , using microarrays ). In Human All Exon V6 Capture system ). In some embodi some embodiments, in - solution capture is used . ments , the portion of the genome that is targeted for capture US 2021/0005284 A1 Jan. 7 , 2021 23 and sequencing is at least 54 Mb ( e.g. , after using the biological samples, extracting RNA and / or DNA from a Clinical Research Exome capture system ( Agilent ) ). biological sample , testing the quality and quantity of [ 0231 ] In some embodiments , quality control is performed extracted RNA and / or DNA samples and / or DNA libraries for whole - exome sequencing. In some embodiments, cluster prepared therefrom , preparing single - cell solutions from a density or cluster PF % is a parameter for determining the biological sample , and preparing DNA libraries from quality of the sample run . In some embodiments, the target extracted RNA and / or DNA . range of cluster density or cluster PF % is at least 170-220 [ 0240 ] In some embodiments, any one of the kits ( e.g. , 170-220 , 190-220 , 210-220 ) . In some embodiments, described herein comprises components for making a cell the acceptable range of cluster density or cluster PF % is at dissociation cocktail. A cell - dissociation cocktail may be least 280 ( e.g. , 280 , 300 , 450 ) . enzymatic or non - enzymatic . In some embodiments, a kit [ 0232 ] In some embodiments, actual yield is a parameter comprises one or more enzyme cocktails. In some embodi for determining the quality of the sample run . In some ments , a kit comprises any one or more of the following embodiments, the target actual yield is at least 15 Gbp ( e.g. , components: media ( e.g. , L - 15 media ), antibacterials ( e.g. , 15 Gbp, 20 Gbp , 30 , Gbp ). penicillin and / or streptomycin ), anti - fungals ( e.g. , ampho [ 0233 ] In some embodiments , % 2Q30 is a parameter for terecin ), collagenase ( e.g. , collagenase I , collagenase II , determining the quality of the sample run . In some embodi collagenase IV ) , DNAse ( e.g. , DNAse I ) , elastase , ments , the target % Q30 is at least 85 % ( e.g. , 85 % , 90 % , hyaluronidase , and proteases ( e.g. , protease XIV , trypsin , 95 % ) . In some embodiments, the acceptable % 2Q30 is at papain , or termolysin ). In some embodiments, any one of the least 75 % ( e.g. , 75 % , 85 % , 95 % ) . kits described herein comprises one or more of the following [ 0234 ] In some embodiments, error rate % is a parameter enzymes : collagenase I and Collagenase IV . In some for determining the quality of the sample run . In some embodiments , these enzymes are comprised in separate embodiments, the target error rate % is less than 0.7 % ( e.g. , containers . In some embodiments , these enzymes are com 0.6 % , 0.5 % , 0.4 % ) . In some embodiments, the acceptable prised in a single container. error rate % is less than 1 % ( e.g. , 0.9 % , 0.8 % , 0.7 % ) . [ 0241 ] In some embodiments, a kit comprises a smaller equipment such as a spectrophotometer. In some embodi Reagents and Kits ments, a kit comprises instructions for performing any one [ 0235 ] Contemplated herein are reagents and kits com of, or a combination of any two or more of the following: prising reagents for performing any one of the methods storing biological samples, extracting RNA and / or DNA described herein . In some embodiments, a kit as provided from a biological sample , testing the quality and quantity of herein comprises reagents ( e.g. , buffers, preservatives , extracted RNA and / or DNA samples and /or DNA libraries inhibitors, or enzymes ) and / or labware ( e.g. , pipettes , filters, prepared therefrom , preparing single - cell solutions from a tubes, storage containers such as vacutainers, or dissection biological sample, and preparing DNA libraries from tools ) for storing biological samples obtained from a subject. extracted RNA and / or DNA . In some embodiments, a kit [ 0236 ] In some embodiments, a kit as provided herein comprises instructions for performing any one of the meth comprises reagents ( e.g., buffers, preservatives, inhibitors or ods described herein . In some embodiments , a kit is fash enzymes ) and / or labware ( e.g. , pipettes, filters, or tubes ) for ioned or tailored for specific tissue types, e.g. , biopsies of a extracting RNA and / or DNA from a biological sample or a solid tumor, a liquid biopsy, a blood sample , or urine . sample derived from a biological sample ( e.g. , a single - cells solution ). In some embodiments , a kit as provided herein Data Processing comprises reagents ( e.g. , buffers, preservatives, inhibitors, [ 0242 ] Aspects of this disclosure relate to processing data enzymes or dyes) and / or labware ( e.g. , pipettes , filters , obtained from RNA sequencing. In some embodiments, a tubes, storage containers, or electrophoresis paper) for mea method to process RNA expression data ( e.g. , data obtained suring the quality and quantity of RNA and / or DNA from RNA sequencing ( also referred to herein as RNA - seq extracted from a biological sample . In some embodiments, data )) comprises aligning and annotating genes in RNA a kit as provided herein comprises reagents ( e.g. , buffers, expression data with known sequences of the human preservatives , inhibitors, enzymes or dyes ) and / or labware genome to obtain annotated RNA expression data ; removing ( e.g. , pipettes, filters, tubes , storage containers, or electro non - coding transcripts from the annotated RNA expression phoresis paper) for measuring the quality and quantity of data; converting the annotated RNA expression data to gene DNA libraries for sequencing ( e.g. , RNA - seq or WES ) . expression data in transcripts per kilobase million ( TPM ) [ 0237 ] In some embodiments, a kit as provided herein format; identifying at least one gene that introduces bias in comprises reagents ( e.g. , buffers, preservatives, inhibitors or the gene expression data; and removing the at least one gene enzymes ) and / or labware ( e.g. , pipettes, filters, tubes, stor from the gene expression data to obtain bias - corrected gene age containers such as vacutainers , or dissection tools ) for expression data . In some embodiments, a method to process preparing a single - cell solution from a biological sample . RNA expression data comprises obtaining RNA expression [ 0238 ] In some embodiments, a kit as provided herein data for a subject having or suspected of having cancer . comprises reagents ( e.g. , buffers , inhibitors , or enzymes [ 0243 ] In some embodiments, non -coding transcripts may such as reverse transcriptase enzyme) and / or labware ( e.g. , comprise genes that belong to groups selected from the list pipettes , filters, tubes, storage containers ) for preparing consisting of: pseudogenes, polymorphic pseudogenes, pro DNA libraries for sequencing . cessed pseudogenes, transcribed processed pseudogenes, [ 0239 ] In some embodiments , a kit as provided herein unitary pseudogenes , unprocessed pseudogenes, transcribed comprises reagents ( e.g. , buffers, preservatives, inhibitors or unitary pseudogenes, constant chain immunoglobulin ( IGC ) enzymes ) and / or labware ( e.g. , pipettes, filters, tubes, stor pseudogenes, joining chain immunoglobulin ( IG J ) pseudo age containers such as vacutainers , or dissection tools ) for genes , variable chain immunoglobulin ( IG V ) pseudogenes , any combination of two or more of the following: storing transcribed unprocessed pseudogenes , translated unpro US 2021/0005284 A1 Jan. 7 , 2021 24 cessed pseudogenes , joining chain T cell receptor ( TR J ) ing and annotating genes in the RNA expression data with pseudogenes, variable chain T cell receptor ( TR V ) pseudo known sequences of the human genome to obtain annotated genes, small nuclear RNAs ( snRNA ), small nucleolar RNAS RNA expression data . ( snoRNA ), microRNAs (miRNA ), ribozymes, ribosomal [ 0251 ] In some embodiments , alignment of RNA expres RNA ( TRNA ), mitochondrial tRNAs ( Mt tRNA ), mitochon sion data comprises aligning the data to a known assembled drial rRNAs ( Mt rRNA ), small Cajal body - specific RNAs genome for a particular species of subject ( e.g. , the genome ( scaRNA ), retained introns, sense intronic RNA , sense over of a human ) or to a transcriptome database . Various lapping RNA , nonsense -mediated decay RNA , non - stop sequence alignment software are available and can be used decay RNA , antisense RNA , long intervening noncoding to align data to an assembled genome or a transcriptome RNAs ( lincRNA ), macro long non - coding RNA ( macro database . Non - limiting examples of alignment software IncRNA ), processed transcripts, 3prime overlapping non includes short ( unspliced ) aligners ( e.g. , BLAT; BFAST, coding RNA ( 3prime overlapping ncrna ), small RNAs Bowtie , Burrows -Wheeler Aligner, Short Oligonucleotide ( sRNA ), miscellaneous RNA (misc RNA ), vault RNA (vaul Analysis package , or Mosaik ) , spliced aligners, aligners tRNA ), and TEC RNA . based on known splice junctions ( e.g. , Errange , IsoformEx , [ 0244 ] In some embodiments, information ( e.g. , sequence or Splice Seq ) , or de novo splice aligner ( e.g. , ABMapper, information ) for one or more transcripts for one of more of BBMap , CRAC , or HISAT ). In some embodiments, any these types of transcripts can be obtained in a nucleic acid suitable tool can be used for aligning and annotating data . database ( e.g. , a Gencode database , for example Gencode For example, Kallisto ( github.com/pachterlab/kallisto ) is V23 , Genbank database , EMBL database , or other data used to align and annotate data . In some embodiments , a known genome is referred to as a reference genome. A base ) . reference genome ( also known as a reference assembly ) is a [ 0245 ] In some embodiments , a method to process RNA digital nucleic acid sequence database , assembled as a expression data ( e.g. , data obtained from RNA sequencing representative example of a species ' set of genes . In some ( also referred to herein as RNA - seq data ) ) comprises iden embodiments, human and mouse reference genomes used in tifying a cancer treatment ( also referred to herein as an any one of the methods described herein are maintained and anti - cancer therapy ) for the subject using the bias - corrected improved by the Genome Reference Consortium ( GRC ) . gene expression data . In some embodiments, any one of the Non - limiting examples of human reference releases are methods of processing RNA expression data is further GRCh38 , GRCh37 , NCBI Build 36.1 , NCBI Build 35 , and combined with administering to a subject one or more NCBI Build 34. A non - limiting example of transcriptome anti - cancer therapy or cancer treatment. In some embodi databased include Transcriptome Shotgun Assembly ( TSA ) . ments, any one of the methods of processing RNA expres [ 0252 ] In some embodiments, annotating RNA expression sion data is further combined with directing or recommend data comprises identifying the locations of genes and / or ing the administering to a subject one or more anti - cancer coding regions in the data to be processed by comparing it therapy or cancer treatment. to assembled genomes or transcriptome databases. Non [ 0246 ] Obtaining RNA Expression Data limiting examples of data sources for annotation include [ 0247 ] In some embodiments , a method to process RNA GENCODE (www.gencodegenes.org ), RefSeq ( see e.g. , expression data ( e.g. , data obtained from RNA sequencing www.ncbi.nlm.nih.gov/refseq/ ), and Ensembl. In some ( also referred to herein as RNA - seq data )) comprises obtain embodiments, annotating genes in RNA expression data is ing RNA expression data for a subject ( e.g. , a subject who based on a GENCODE database ( e.g. , GENCODE V23 has or has been diagnosed with a cancer ). In some embodi annotation ; www.gencodegenes.org ). ments , obtaining RNA expression data comprises obtaining [ 0253 ] Consea et al . ( A survey of best practices for RNA a biological sample and processing it to perform RNA seq data analysis ; Genome Biology201617 : 13 ) provides best sequencing using any one of the RNA sequencing methods practices for analyzing RNA - seq data , which are applicable described herein . In some embodiments , RNA expression to any one of the methods described herein and is incorpo data is obtained from a lab or center that has performed rated herein by reference in its entirety. Pereira and Rueda experiments to obtain RNA expression data ( e.g. , a lab or ( bioinformatics-core-shared-training.github.io/cruk-bioinf center that has performed RNA - seq ). In some embodiments , sschool/ Day2 / rnaSeq_align.pdf) also describe methods for a lab or center is a medical lab or center. analyzing RNA sequencing data, which are applicable to any [ 0248 ] In some embodiments, RNA expression data is one of the methods described herein , and is incorporated obtained by obtaining a computer storage medium ( e.g. , a herein by reference in its entirety . data storage drive ) on which the data exists . In some [ 0254 ] Removing Non - Coding Transcripts embodiments , RNA expression data is obtained via a [ 0255 ] In some embodiments , a method to process RNA secured server ( e.g. , a SFTP server, or Illumina BaseSpace ) . expression data ( e.g. , data obtained from RNA sequencing In some embodiments , data is obtained in the form of a ( also referred to herein as RNA - seq data )) comprises remov text - based filed ( e.g. , a FASTQ file ). In some embodiments, ing non - coding transcripts from annotated RNA expression a file in which sequencing data is stored also contains quality data . Aligning and annotating RNA expression data allows scores of the sequencing data ). In some embodiments , a file identification of coding and non - coding reads. In some embodiments, non - coding reads for transcripts are removed in which sequencing data is stored also contains sequence so as to concentrate analysis effort on expression of proteins identifier information . ( e.g. , those that may be involved in pathology of cancer ) . In [ 0249 ] Alignment and Annotation some embodiments, removing reads for non - coding tran [ 0250 ] In some embodiments, a method to process RNA scripts from the data reduces the variance in the data , e.g. , expression data ( e.g. , data obtained from RNA sequencing in replicates of the same or similar sample ( e.g. , nucleic acid ( also referred to herein as RNA - seq data )) comprises align from the same cells or cell - type ). In some embodiments , US 2021/0005284 A1 Jan. 7. 2021 25 non - limiting examples of expression data that is removed is incorporated herein by reference in its entirety. In some include one or more non - coding transcripts ( e.g. , 10-50 , embodiments, the following formula is used to calculate 50-100 , 100-1,000 , 1,000-2,500 , 2,500-5,000 or more non TPM : coding transcripts ) that belong to one or more gene groups selected from the list consisting of: pseudogenes , polymor phic pseudogenes , processed pseudogenes , transcribed pro A · 106 cessed pseudogenes, unitary pseudogenes , unprocessed ? ( ? ) pseudogenes, transcribed unitary pseudogenes, constant total reads mapped to gene . 103 chain immunoglobulin ( IG C ) pseudogenes, joining chain Where A = gene length in bp immunoglobulin ( IG J ) pseudogenes, variable chain immu noglobulin ( IG V ) pseudogenes , transcribed unprocessed pseudogenes, translated unprocessed pseudogenes , joining [ 0261 ] Removing Bias chain T cell receptor ( TR J ) pseudogenes, variable chain T [ 0262 ] Since conversion of RNA expression data to obtain cell receptor ( TR V ) pseudogenes, small nuclear RNAs expression in TPM format requires dividing the number of ( snRNA ), small nucleolar RNAs ( snoRNA ), microRNAs reads for a given transcript by the length of a transcript read , (miRNA ), ribozymes , ribosomal RNA ( rRNA ), mitochon biases may be introduced in the data for various reasons ( as drial tRNAs (Mt tRNA ), mitochondrial rRNAs (Mt rRNA ), described below ) . Accordingly, some embodiments of any small Cajal body -specific RNAs ( scaRNA ), retained introns, one of the methods described herein comprise identifying at sense intronic RNA , sense overlapping RNA , nonsense least one gene that introduces bias in the gene expression mediated decay RNA , non - stop decay RNA , antisense RNA , data . Some embodiments of any one of the methods long intervening noncoding RNAs ( lincRNA ), macro long described herein comprise identifying at least one gene that non - coding RNA (macro lncRNA ), processed transcripts , introduces bias in the gene expression data, and removing 3prime overlapping non - coding RNA ( 3prime overlapping expression data for the at least one gene from the gene ncrna ), small RNAs (sRNA ), miscellaneous RNA ( misc expression data to obtain bias -corrected gene expression RNA ), vault RNA ( vaultRNA ), and TEC RNA . data . [ 0256 ] In some embodiments, information ( e.g. , sequence [ 0263 ] In some embodiments , removing data from a data information ) for one or more transcripts for one of more of set may involve deleting the data from the dataset, marking these types of transcripts can be obtained in a nucleic acid the data so that it is not used in some or all subsequent database ( e.g. , a Gencode database , for example Gencode processing of the dataset, and / or doing any other suitable V23 , Genbank database , EMBL database , or other data processing so that the data is not used in some or all base ) . In some embodiments , a fraction ( e.g. , 10 % , 20 % subsequent processing of the dataset. For example, remov 30 % , 40 % , 50 % , 60 % , 70 % , 80 % , 90 % , 95 % , 98 % , 99 % , or ing particular expression data ( e.g. , expression data for at 99.5 % or more ) of the non - coding transcripts, histone least one gene introducing bias ) from gene expression data encoding gene, mitochondrial genes , interleukin - encoding may involve deleting the particular expression data from the genes, collagen - encoding genes , and / or T cell receptor gene expression data , marking the particular expression data encoding genes as described herein are removed from and / or doing any other suitable processing so that the particular expression data is not used in some or all subse aligned and annotated RNA expression data . quent processing of the gene expression data . As another [ 0257 ] Conversion to TPM and Gene Aggregation example, removing non - coding transcripts from the RNA [ 0258 ] In some embodiments, a method to process RNA expression data ( as described above ) may involve deleting expression data ( e.g. , data obtained from RNA sequencing the non - coding transcripts, marking the non -coding tran ( also referred to herein as RNA - seq data )) comprises nor scripts , and / or doing any other suitable subsequent process malizing RNA expression data per length of transcript ( e.g. , ing so that the non - coding transcripts are not used in some to transcripts per kilobase million ( TPM ) format ) that is or all subsequent processing of the RNA expression data . As read . In some embodiments, RNA expression data that is yet another example, removing sequence data , determined to normalized per length of transcript is first aligned and not pass one or more quality control checks during perfor annotated . Conversion of data to TPM allows presentation of mance of quality control techniques described herein , may expression in the form of concentration, rather than counts, involve deleting the sequence data , marking the sequence which in turn allows comparison of samples with different data and / or doing any other suitable processing so that the sequence data failing the quality control check ( s ) is not used total read counts and / or length of reads. in some or all subsequent processing . [ 0259 ] In some embodiments , RNA expression data that is [ 0264 ] In some embodiments, biases in expression data normalized per length of transcript read is then analyzed to converted to TPM format are attributed to transcripts of an obtain gene expression data ( expression data per gene ). This average length that is at least a threshold amount higher or is also referred to as gene aggregation . Gene aggregation lower than an average length of transcript as read in the comprises combining expression data in reads for transcripts entire expression data set . For example , a gene for which one for all isoforms of a gene to obtain expression data for that or more transcript of one or more isoforms has a length that gene . In some embodiments, gene aggregation to obtain is a threshold ( e.g. , at least 1 standard deviations, 2 standard gene expression data is performed after TPM normalization deviations, 3 standard deviations, 4 standard deviations, 5 but before identifying genes that introduce bias . In some standard deviations, 6 standard deviations, 7 standard devia embodiments, gene aggregation is performed before con tions , 8 standard deviations , 9 standard deviations, 10 stan version of the data to TPM . dard deviations, 11 standard deviations, 12 standard devia [ 0260 ] Wagner et al ( Theory Biosci . ( 2012 ) 131 : 281-285 ) tions , 13 standard deviations 13 standard deviations, or 15 provides an explanation of how TPM can be calculated and standard deviations or more ) lower from the mean or median US 2021/0005284 A1 Jan. 7 , 2021 26 transcript length in the entire expression data set , the expres performed experiments or previously performed processing sion of the gene in TPM format will artificially appear to be of data ) to processing RNA expression data and can be used high . Conversely, if a gene for which one or more reads of to filter out data for that family of genes. one or more isoforms is of a length that is a threshold ( e.g. , [ 0268 ] In some embodiments , a gene that introduces bias at least 1 standard deviations , 2 standard deviations, 3 to an expression data set may belongs to a family of genes standard deviations, 4 standard deviations, 5 standard devia having a polyA tail that is on average smaller or higher tions , 6 standard deviations, 7 standard deviations, 8 stan compared to an average length of polyA tails of genes from dard deviations, 9 standard deviations, 10 standard devia a sample from which the RNA expression data was obtained tions , 11 standard deviations, 12 standard deviations, 13 ( or another reference sample ). In some embodiments, standard deviations 13 standard deviations, or 15 standard " smaller or higher ” may refer to a numerical value that is deviations or more ) higher than the mean or median read smaller or higher relative to a known , average threshold length in the entire expression data set, the expression of the value of one or more genes . gene in TPM format will artificially appear to be low . In [ 0269 ] In some embodiments, a gene that introduces bias some embodiments, a threshold value is set in terms of to an expression data set belongs to a family of genes standard deviations ( e.g. , at least 1 standard deviations, 2 selected from the group consisting of : histone -encoding standard deviations, 3 standard deviations , 4 standard devia genes, mitochondrial genes, interleukin - encoding genes , tions , 5 standard deviations, 6 standard deviations, 7 stan collagen - encoding genes, B cell receptor encoding genes , dard deviations, 8 standard deviations, 9 standard devia and T cell receptor - encoding genes . In some embodiments , tions , 10 standard deviations , 11 standard deviations, 12 a gene that introduces bias to an expression data set can be standard deviations, 13 standard deviations 13 standard any other gene that has a polyA tail that is on average smaller deviations , or 15 standard deviations or more ). In some or higher compared to an average length of polyÀ tails of embodiments, a threshold value is set based on a length of genes from a sample from which the RNA expression data transcript and / or length of read , e.g. , below 5 bp , below 10 was obtained ( or another reference sample ). bp , below 15 bp , below 20 bp , below 25 bp , below 50 bp , [ 0270 ] In some embodiments, the histone - encoding genes , below 75 bp , below 100 bp , or below 150 bp or more . mitochondrial genes, interleukin - encoding genes, collagen [ 0265 ] In some embodiments , biases are attributed to the encoding genes , B cell receptor encoding genes, and / or T lengths of polyA tail on a transcript. In some embodiments , cell receptor - encoding genes are genes in the human sample RNA transcripts having a polyA tail that is on average that comprise a poly ) tail that is on average smaller or smaller or higher than the average length of polyA tail for higher compared to an average length of polyA tails of genes RNA transcripts in a sample are enriched more or less than from a sample from which the RNA expression data was the average enrichment of all RNA transcripts in a sample . obtained . For example , histone - encoding genes comprise a Accordingly , a gene may be associated with a polyA tail that polyA tail that is on average smaller to an average length of is at least a threshold amount smaller in length compared to polyA tails of genes from a sample from which the RNA an average length of polyA tails of genes from a sample from expression data was obtained . In some embodiments , his which the RNA expression data was obtained . In some tone - encoding genes do not comprise a polyA tail . In some embodiments , such expression data for such genes is also embodiments, a polyA tail is minimally or not detected in removed from the gene expression data to obtain bias histone - encoding genes . corrected gene expression data . Removing expression data [ 0271 ] In some embodiments, one or more gene or protein associated with one or more genes from a data set to reduce abbreviations or acronyms are used in this application to bias may be considered as a type of filtering of the data . In refer to the genes ( or genes encoding the proteins ) using some embodiments, “ filtration ” may refer to any one or their recognized scientific nomenclature. Additional infor more of removing expression data for genes that appear mation about the genes and / or encoded proteins can be artificially high or low ( e.g. , because of the lengths of found in one or more genetic sequence databases , for transcripts, or the length of the polyA tails associated with example the NIH genetic sequence database (GenBank , transcripts ), and removing expression data of non - coding www.ncbi.nlm.nih.gov ), the EMBL database ( the European RNA from data . Molecular Biology Laboratory nucleotide sequence data [ 0266 ] In some embodiments, identifying at least one gene base , www.ebi.ac.uk/embl/index.html) , the EMBL Euro that introduces bias in the gene expression data comprises pean Bioinformatics Institute database ( EMBL - EBI Euro analyzing the length of transcripts within the data set that is pean Nucleotide Archive , www.ebi.ac.uk/ena ), the being analyzed . In some embodiments, removing, from the GENCODE database ( www.gencodegenes.org ), or other gene expression data, expression data for at least one gene suitable database , the contents of which are incorporated by that introduces bias decreases variability and improves the reference herein for the different types of genes and names overall accuracy of subsequent gene expression - based of genes referred to herein . In some embodiments , the gene analysis . or protein abbreviations or acronyms are referring to the [ 0267 ] In some embodiments, identifying at least one gene human genes ( or human genes encoding the proteins ) . that introduces bias in the gene expression data comprises [ 0272 ] In some embodiments, a histone - encoding gene is use of knowledge gained from analyzing data outside of the HISTIHIA , HIST1H1B , HIST1HIC , HISTIHID , expression data set in questions, e.g. , using reference data HIST1H1E , HIST1HIT , HIST1H2AA , HIST1H2AB , sets . The inventors recognized that removing ( expression HIST1H2AC , HIST1H2AD , HIST1H2AE , HIST1H2AG , data for ) genes having polyA tail length that is outside the HIST1H2AH , HIST1H2AI , HIST1H2AJ , HIST1H2AK , average range of poly tails in a RNA expression data set HIST1H2AL , HIST1H2AM , HIST1H2BA , HIST1H2BB , effectively removes bias and / or outliers in the gene expres HIST1H2BC , HIST1H2BD , HIST1H2BE , HIST1H2BF , sion data . For example , knowledge that a certain family of HIST1H2BG , HIST1H2BH , HIST1H2BI , HIST1H2BJ , genes introduces biases can be had a priori ( from previously HIST1H2BK , HIST1H2BL , HIST1H2BM , HIST1H2BN , US 2021/0005284 A1 Jan. 7 , 2021 27

HIST1H2BO , HIST1H3A, HIST1H3B , HIST1H3C, issues in connection to the quality are solved , the processes HIST1H3D , HIST1H3E, HIST1H3F, HIST1H3G , of sample preparation are completed and bioinformatics HISTI??? , HIST1H31 , HIST1??? ,, HIST1H4A , analysis ( e.g. , post - sequencing) is performed. HIST1H4B , HIST1H4C , HIST1H4D , HIST1H4E , [ 0276 ] Aspects of methods and systems described herein HIST1H4F , HIST1H4G , HIST1H4H , HIST1H4I , provide for quality control to be performed on gene expres HIST1H4J , HIST1H4K , HIST1H4L , HIST2H2AA3, sion data to improve the accuracy and reliability of subse HIST2H2AA4 , HIST2H2AB , HIST2H2AC , HIST2H2BE , quent expression analysis ( e.g. , to determine a diagnosis , HIST2H2BF, HIST2H3A , HIST2H3C , HIST2H3D , prognosis, and / or treatment for the patient or subject) and HIST2H3PS2 , HIST2H4A , HIST2H4B , HIST3H2A , any resulting recommendation . HIST3H2BB , HIST3H3 , or HIST4H4 . In some embodi [ 0277 ] In some embodiments, bioinformatic quality con ments , a mitochondrial gene is MT- ATP6 , MT- ATP8 , MT trol of sequence data can be conducted as a standalone CO1 , MT -CO2 , MT- CO3 , MT- CYB , MT -ND1 , MT- ND2 , process ( e.g. , based on nucleic acid data that is received MT- ND3 , MT -ND4 , MT- ND4L , MT- ND5 , MT- ND6 , MT from a healthcare provider) or in connection with a prior RNR1, MT -RNR2 , MT- TA , MT - TC , MT- TD , MT- TE , MT sample preparation process ( e.g. , if a patient sample is TF , MT - TG , MT - TH , MT- TI, MT- TK , MT- TL1, MT - TL2, provided by the healthcare provider as opposed to nucleic MT - TM , MT - TN , MT- TP , MT- TQ , MT- TR , MT- TS1 , MT acid sequence data ) . As illustrated in FIG . 7 , act 301 to act TS2 , MT- TT, MT- TV , MT - TW , MT- TY , MTRNR2L1, 310 illustrate non - limiting sample preparation processes as MTRNR2L10 , MTRNR2L11 , MTRNR2L12 , described in the present disclosure , whereas act 311 to act MTRNR2L13 , MTRNR2L3, MTRNR2L4 , MTRNR2L5 , 315 illustrate non - limiting quality control processes as MTRNR2L6 , MTRNR2L7, or MTRNR2L8 . described in the present disclosure . In some embodiments, [ 0273 ] In some embodiments, removing expression data one or more of act 301 to act 310 can be performed for at least one gene that introduces bias in the gene independently ( e.g. , without one or more of act 311 to act expression data comprises removing expression data for one 315 ) . In some instances, one or more of act 301 to act 310 or multiple ( e.g. , at least 2 , at least 5 , at least 10 , at least 15 , can be skipped or delayed . Act 311 to act 315 can be at least 20 , at least 30 , at least 40 , at least 50 , at least 60 , at performed independently ( e.g. , without act 301 to act 310 ) . least 70 , at least 80 , at least 90 , at least 100 , at least 150 , at In some instances, one or more of act 311 to act 315 can be least 200 , at least 250 , at least 300 , at least 350 , at least 400 , skipped or delayed . In some instances, one or more sample at least 450 , at least 500 , between 2 and 1000 , orany suitable preparation ( act 301 to act 310 ) and quality control ( act 311 number of genes in these ranges ) genes in each of one or to act 315 ) processes can both be performed . In some multiple ( 2 , 3 , 4 , 5 , or all ) gene families including histone instances, one or more of the sample preparation processes encoding genes, mitochondrial genes , interleukin -encoding and one or more of the quality control processes can be genes , collagen - encoding genes , B cell receptor - encoding performed . genes , and T cell receptor -encoding genes ) . In some embodi [ 0278 ] In some embodiments, a process pipeline 300 is ments, removing expression data for at least one gene that performed by obtaining a first tumor sample from a subject introduces bias in the gene expression data comprises having, suspected of having, or at risk of having cancer at act removing expression date for any of one or more genes that 301 , extracting RNA from the first sample of the first tumor have a polyA tail that is on average smaller or higher at act 302 , enriching the extracted RNA for coding RNA to compared to an average length of polyA tails of genes from obtain enriched RNA at act 303 , preparing a first library of a sample from which the RNA expression data was obtained cDNA fragments from the enriched RNA for non - stranded ( or a reference sample ). RNA sequencing at act 304 , obtaining RNA expression data [ 0274 ] In some embodiments, after expression data for at for a subject having, suspected of having, or at risk of having least one gene that introduces bias is removed from the gene cancer at act 305 , aligning and annotating genes in the RNA expression data , the remaining gene expression data may be expression data with known sequences of the human normalized again ( “ renormalized ” ) ( e.g. , to TPM or any genome to obtain annotated RNA expression data at act 306 , other suitable unit such as reads per kilobase million removing non - coding transcripts from the annotated RNA ( RPKM ) or fragments per kilobase million ( FPKM ) ) so that expression data at act 307 , converting the annotated RNA the normalized expression values are not biased by the expression data to gene expression data in transcripts per expression data of the biasing gene ( s ), which was removed . kilobase million ( TPM ) at act 308 , identifying at least one In some embodiments, the remaining gene expression data gene that introduces bias in the gene expression data at act may have expression data for at least 1,000 genes , at least 309 , removing at least one gene from the gene expression 5,000 genes, at least 10,000 genes, between 500 and 5000 data to obtain bias - corrected gene expression data at act 310 , genes , between 1000 and 10,000 genes, between 5,000 and obtaining sequence information and asserted information at 15,000 genes or any suitable number of genes within these act 311 , determining one or more features from sequence ranges . information at act 312 , determining whether one or more features match asserted information at act 313 , making at Post - Sequencing Nucleic Acid Data Quality Control least one additional determination of the features at act 314 , [ 0275 ] As provided in the present disclosure , quality con identifying a cancer treatment for the subject using the trol is regularly performed during sample preparation pro bias - corrected gene expression data at act 315 . cesses . For example, the purity of the extracted nucleic acids [ 0279 ] In some embodiments , act 305 may comprise or the size distribution of the DNA libraries ) are detected . obtaining the RNA expression data by using a sequencing When one or more of the quality control issues occurs and platform or by receiving from a healthcare provider or is not able to be remedied in the laboratory, the provider laboratory . In some embodiments, act 306 may comprise ( e.g. , healthcare provider) of the biological sample is noti converting the RNA expression data to gene expression data . fied before proceeding to the subsequence steps . After the As described herein , the “ known sequence of the human US 2021/0005284 A1 Jan. 7 , 2021 28 genome” may refer to a reference . In some embodiments , act verify or retest questionable sequence information or outli 307 may comprise converting the RNA expression data to ers , etc. ) source and / or integrity ( e.g. , also as may be gene expression data . In some embodiments, act 307 may referred to herein as quality ) of sequence information which comprise obtain filtered RNA expression data . In some may be the subject of further use ( e.g. , being used for embodiments , act 308 may comprise normalizing the filtered analysis beyond the initial sequencing step ) , for example for RNA expression data to obtain gene expression data in diagnostic , prognostic, and / or clinical applications . transcripts per kilobase million ( TPM ) . In some embodi [ 0283 ] The disclosure recognizes the prevalence of next ments , the asserted information act 311 may indicate an generation sequencing techniques and platforms employed asserted source and / or an asserted integrity of the sequence across a variety of disciplines within the scientific commu data . In some embodiments, act 312 may comprise deter nity . The disclosure also recognizes the variety of protocols mining one or more disease features. In some embodiments, and methodologies associated with the different techniques act 312 may comprise processing the sequence information and platforms employed . The variation in the platforms, and or data to obtain determined information indicating a deter protocols to use the various platforms, creates variability mined source and / or a determined integrity of the sequence within the data and sequence information realized from the information or data . In some embodiments , act 313 may use thereof, which presents a significant hurdle in using the comprise determining whether the determined information sequence information for substantive analysis , especially if matches the asserted information . In some embodiments, the such sequence information is to be used for analysis beyond at least one additional determination of the feature at process the initial data run by the original user of the sample ( e.g. , 314 may comprise determining disease features or features by a secondary user , beyond the user who procured and that are not directly related to diseases. performed the initial sequencing, third parties to the [ 0280 ] Aspects of methods and systems described herein sequencing, etc. ) . provide an approach for validating nucleic acid sequence [ 0284 ] Accordingly, the disclosure presents a variety of data by obtaining both the sequence data and asserted methods and processes to assess the quality of sequence information related to one or more features of the sequence information ( e.g. , for correct identification of the sequence data ( e.g. , source , type of nucleic acid , expected integrity, information , sample identification, subject identification , etc. ) , determining one or more features from the sequence etc. ) , as well as to assess the integrity of the sequence data, and verifying that the one or more features determined information ( e.g. , create checkpoints to screen for various from the sequence data match the asserted information about integrity issues , for example , contamination or degradation ). those features. In some embodiments, the asserted informa For example, in some embodiments, described herein are tion can be information about the patient, tissue type , tumor methods for evaluating sequence information by obtaining type , nucleic acid type ( RNA , DNA , WES , polyA , etc. ) , sequence information from a nucleic acid of a sample of a sequencing protocol that was used , etc., or a combination subject, obtaining asserted information , determining a fea thereof. In some embodiments , the asserted information can ture ( e.g. , source , identity, status , characteristic ) of the be an expected and /or acceptable ( e.g. , acceptable for a sequence information, and comparing the asserted informa subsequent analysis of the sequence data ) integrity threshold tion with the determined information . The sequence infor for the sequence information , including for example, an mation may be obtained ( e.g. , acquired ) from any source , or expected and / or acceptable level of GC content, contami through any means known in the art. Accordingly, the nation , coverage ( e.g. , genome , exome, exon , protein encod sequence information may be generated using any suitable ing , or other coverage ) or other measure of integrity . sequencing technology . Alternatively, the sequence informa [ 0281 ] Nucleic acid sequencing, next generation sequenc tion may be obtained electronically from a third party that ing ( NGS ) in particular, allows for the generation of large generated the sequence information . In some embodiments, amounts of information for a given nucleic acid ( DNA , sequence information ( e.g. , reference sequence information ) RNA , genome , exome , transcriptome, etc. ). However, is obtained from an existing databank of sequences . In some because of the many different sequencing platforms that are embodiments , sequence information is obtained from a available , the variety of sample preparation and sequencing company, a non -profit organization , an academic institution , protocols and techniques that are used , and the variability or a healthcare organization . and inconsistency between platforms and protocols, there is [ 0285 ] In some embodiments, a sample may be any speci substantial variability in the content and coverage of the men , biopsy, or biological component obtained ( e.g. , pro resulting nucleic acid sequence information . Moreover, cured , taken , received ) from a subject. For example , in some when evaluating sequence information from several embodiments , the sample may be a blood sample , hair sequencing runs , or large sets of sequence information from sample, tissue sample, bodily fluid sample, cell sample, a plurality of sequencing runs ( e.g. , including, for example , blood component sample , or any other cell or tissue sample historical data from different medical visits for one or more from which a nucleic acid may be obtained for sequencing. patients ) or from different studies ( e.g. , from studies to [ 0286 ] In some embodiments, the subject may be any create prognostic or diagnostic evaluations , or from studies organism in need of treatment or diagnosis using methods or to evaluate the effect of a drug or treatment on the progres systems of the disclosure . For example, without limitation , sion of a disease , etc.) it can be can be challenging to subjects may include mammals and non -mammals . As used combine sequence information from different sources . In herein , a “ mammal, ” refers to any animal constituting the addition , in can be challenging to detect incorrectly identi class Mammalia ( e.g. , a human , mouse , rat, cat, dog , sheep , fied sequence data when large amounts of information are rabbit, horse , cow , goat , pig , guinea pig , hamster, chicken , being combined from different sources . turkey, or a non -human primate ( e.g. , Marmoset, Macaque ) ). [ 0282 ] Currently, no robust methods exist to validate ( e.g. , In some embodiments , the mammal is a human . In some raise the confidence , reduce the uncertainty, correct for or embodiments , the subject is a mammal. In some embodi omit low quality sequence information , provide a signal to ments , the subject is a human . US 2021/0005284 A1 Jan. 7 , 2021 29

[ 0287 ] In some embodiments, a sample may be a biologi from the other regions ( e.g. , unbound oligonucleotides ). cal sample obtained from a subject, e.g. , from a patient. In These tagged fragments can then be prepared and some embodiments, a sample may be blood , serum , sputum , sequenced. urine, or a tissue biopsy ( e.g. , from any tissue , including but [ 0291] In some embodiments , the nucleic acid is ribo not limited to heart, liver, pancreas , CNS , gastro -intestinal nucleic acid (RNA ). In some embodiments, sequenced RNA tract, mouth , colon , kidney, and skin ) . In some embodi comprises both coding and non -coding transcribed RNA ments , a sample may be suspected to be a disease sample found in a sample. When such RNA is used for sequencing ( e.g. , a cancer sample ). In some embodiments, a sample may the sequencing is said to be generated from " total RNA " and be a healthy sample ( e.g. , to be used as a reference ). also can be referred to as whole transcriptome sequencing. [ 0288 ] In some embodiments, sequence information is Alternatively, the nucleic acids can be prepared such that the obtained from a next generation sequencing platform ( e.g. , coding RNA ( e.g. , mRNA ) is isolated and used for sequenc IlluminaTM , RocheTM , Ion TorrentTM , etc. ) , or any high ing . This can be done through any means known in the art, throughput or massively parallel sequencing platform . In for example by isolating or screening the RNA for polyade some embodiments, these methods may be automated , in nylated sequences. This is sometimes referred to as mRNA some embodiments , there may be manual intervention . In Seq . Sequence information can include the sequence data some embodiments, the sequence information may be the generated by the nucleic acid sequencing protocol ( e.g. , the result of non -next generation sequencing ( e.g. , Sanger series of nucleotides in a nucleic acid molecule identified by sequencing ). In some embodiments, the sample preparation next- generation sequencing, sanger sequencing , etc.) as well may be according to manufacturer's protocols . In some as information contained therein ( e.g. , information indica embodiments, the sample preparation may be custom made tive of source , tissue type , etc. ) which may also be consid protocols, or other protocols which are for research , diag ered information that can be inferred or determined from the nostic , prognostic, and / or clinical purposes. In some sequence data. For example , in some embodiments RNA embodiments, the protocols may be experimental. In some sequence information may be analyzed to determine whether embodiments, the origin or preparation method of the the nucleic acid was primarily polyadenylated or not . sequence information may be unknown . [ 0292 ] Asserted information can refer to information [ 0289 ] In some embodiments, the size of the obtained about the sequence data , and by extension , the nucleic acid , RNA and / or DNA sequence data comprises at least 5 kilo the sample , and / or the subject from which the sequence data bases ( kb ). In some embodiments , the size of the obtained was obtained . In some embodiments , asserted information is RNA and / or DNA sequence data is at least 10 kb . In some provided along with the sequence data and can be verified by embodiments , the size of the obtained RNA and /or DNA analyzing the sequence data as described herein . The sequence data is at least 100 kb . In some embodiments, the asserted information may relate to a feature of the nucleic size of the obtained RNA and / or DNA sequence data is at acid , the sample, or the subject, and can be used to evaluate least 500 kb . In some embodiments, the size of the obtained the quality of the nucleic acid ( e.g. , the source or integrity RNA and / or DNA sequence data is at least 1 megabase of the nucleic acid ) . Asserted information can refer to an ( Mb ). In some embodiments, the size of the obtained RNA asserted source and / or an asserted integrity of the sequence and / or DNA sequence data is at least 10 Mb . In some data or information . embodiments , the size of the obtained RNA and / or DNA [ 0293 ] In some embodiments, a third party may provide sequence data is at least 100 Mb . In some embodiments , the sequence data as well as the related asserted information . In size of the obtained RNA and / or DNA sequence data is at some embodiments, the asserted information is obtained least 500 Mb . In some embodiments, the size of the obtained from the same entity that the sequence data is obtained from . RNA and / or DNA sequence data is at least 1 gigabase ( Gb ). In some embodiments , the asserted information and In some embodiments, the size of the obtained RNA and /or sequence data are obtained from different parties . In some DNA sequence data is at least 10 Gb . In some embodiments , embodiments, the asserted information is obtained from a the size of the obtained RNA and / or DNA sequence data is database . In some embodiments, the asserted information is at least 100 Gb . In some embodiments, the size of the a reference value or property . In some embodiments , the obtained RNA and / or DNA sequence data is at least 500 Gb . asserted information may allege an identity of the sequence [ 0290 ] In some embodiments, the sequence information information , an identity of the nucleic acid of the sequence may be generated using a nucleic acid from a sample from information , an identity of the sample from which the a subject. In some embodiments, the sequence information sequence information was generated, an identity of the may be a sequence data indicating a nucleotide sequence of subject from which the sample was obtained . In some DNA and / or RNA from a previously obtained biological embodiments, the asserted information may identify the sample of a subject having, suspected of having , or at risk of sequence data as obtained from polyadenylated RNA , as having a disease . In some embodiments, the nucleic acid is originating from whole transcriptome sequencing, or as deoxyribonucleic acid (DNA ). In some embodiments, the being from WES . In some embodiments, the asserted infor nucleic acid is prepared such that the whole genome is mationmay identify a cell or tissue type for the sample from present in the nucleic acid . In some embodiments, the which the nucleic acid was obtained . In some embodiments, nucleic acid is processed such that only the protein coding the asserted information may allege a tumor type for the regions of the genome remain ( e.g. , exomes ) . When nucleic sample from which the nucleic acid was obtained . In some acids are prepared such that only the exomes are sequenced , embodiments, the asserted information may identify an it is referred to as whole exome sequencing ( WES ) . A MHC profile ( e.g. , sequences for alleles of the MHCs of the variety of methods or known in the art to isolate the exomes subject from which the nucleic acid was obtained ) for the for sequencing, for example, solution based isolation subject from which the sample was obtained . In some wherein tagged probes are used to hybridize the targeted embodiments, the asserted information may identify an regions ( e.g. , exomes ) which can then be further separated expected protein subunit ratio for the sample. In some US 2021/0005284 A1 Jan. 7 , 2021 30 embodiments , the asserted information may provide an mined , matched , aligned , measured , assessed ) against the expected complexity value for the sequence information . In asserted information . This evaluation can be done to some embodiments, the asserted information may provide increase the confidence that the sequence information is of an expected contamination value for the sequence informa a particular origin ( e.g. , source ), is identified correctly, or tion . In some embodiments , the asserted information may has a particular or specific characteristic ( e.g. , is from provide an expected coverage value for the sequence infor polyadenylated nucleic acids ) . In this respect, the methods mation . In some embodiments , the asserted information may can be used to provide checkpoints and measures to high provide an expected exon coverage value for the sequence light any potential problems ( e.g. , non -matching values information . In some embodiments , the asserted information ( e.g. , for determined features and asserted information ), or may provide an expected read composition value for the determined values which fall outside of accepted or estab sequence information . In some embodiments, the asserted lished ranges ). Such problems can signify or signal prob information may provide an expected Phred score for the lems with integrity ( e.g. , degraded or contaminated ) or sequence information . In some embodiments , the asserted source ( e.g. , misidentified , wrongly labeled, etc. ) of the information may provide an expected single nucleotide sequence data . By using the methods and processes herein , polymorphism ( SNP ) value for the sequence information . In and matching determined features to asserted information some embodiments, the asserted information may relate to a for given sequence information , it lessens the possibility of GC content value for the sequence information . In some incorrect or poor quality sequence data being used for embodiments, the asserted information may comprise addi analysis and raises the confidence that the sequence data is tional information . In some embodiments , the asserted infor of sufficient quality to be used for diagnostic, prognostic , mation may comprise information relating to multiple or and / or clinical analyses. more than one feature of the sequence information . In some [ 0297 ] In some embodiments , evaluating whether deter embodiments , the asserted information is any combination mined information matches asserted information involves of the aforementioned features ( e.g. , determined values , determining whether the determined information matches properties, characteristics , etc. ) . asserted information exactly are within a specified threshold . [ 0294 ] As used herein , a " feature ” may be a property or More generally, in some embodiments , evaluating whether characteristic , which is determined from analysis of the two values “ match” may involve determining whether the sequence information which provides the user with infor two values match exactly or are within a specified threshold . mation about the sequence information , the sample from That threshold may be 0 , requiring exact matches in some which it was taken , and / or the subject from which the embodiments . That threshold may be greater than 0 such that sample was taken , beyond the sequence of the nucleotides of when numerical values are being compared , the numerical the sequence information . The sequence information may be values may be said to “match ” if their values are within the in connection with the gene expression data obtained from threshold of one another ( e.g. , when the absolute difference a healthcare provider or a laboratory. For example, a feature of the numerical values is less than or equal to the threshold may be indicative of a source ( e.g. , patient, subject, nucleic value ) . In some embodiments, the threshold may be set as a acid type ), patient or subject identity, tissue type , tumor type , function of a standard deviation ( or a multiple thereof ), a status, MHC sequence , protein subunits, quantile , a percentile, or any other suitable statistical quan complexity, contamination , coverage ( e.g. , total sequence , tity. In some embodiments, evaluating whether two values exon , etc. ), read composition , quality and / or Phred Score, " match ” may involve determining, when there is a difference single nucleotide polymorphism ( SNP ) positions , and / or GC between the two values , whether that difference is statisti content. The feature ( s ) of the sequence information can then cally significant. Such a determination may be performed be indicative of whether the sequence information is poten using a statistical hypothesis test , a threshold , or any other tially a match or mismatch to the asserted information from suitable statistical or mathematical technique, as aspects of a healthcare provider or a laboratory . the technology described herein are not limited in this [ 0295 ] When considering identity or source , it is important respect. to recognize that the term not only refers to identifying a specific subject or patient as a particular individual, but also [ 0298 ] In some embodiments, one or more quality control that one or more features of sequence information for one parameters are checked for the bioinformatics data . In some sample can be identified as being the same as the one or embodiments, tumor purity can be checked . Tumor purity, as more features of sequence information obtained from described herein , may refer to the proportion of cancer cells another sample . For example, sequence information A may in the admixture . In some embodiments, the target tumor be compared to sequence information B which is presented purity for the WES is 220 % ( e.g. , 20 % , 40 % , 60 % ) . In some and asserted to be from the same nucleic acid , subject or embodiments, the target tumor purity for the RNA - seq is patient, tissue , or tumor . The identity can be corroborated or 220 % ( e.g. , 20 % , 40 % , 60 % ). questioned by the methods herein without knowing the [ 0299 ] In some embodiments, depth of coverage can be actual identity of the subject but can support the finding that checked . In some embodiments , the depth of coverage for the identity is consistent with another given sequence infor the WES is 2150x average coverage of tumor sample ( e.g. , mation . In some embodiments , the identity of the sequence 150x , 180x , 200x ) . In some embodiments, the target depth information is used to compare to asserted information for a of coverage for the RNA -seq is z100x ( e.g. , 100x , 150x , given sample, subject, tissue , or tumor. In some embodi 200x ) . ments the identity of the sequence information is used to [ 0300 ] In some embodiments , alignment rate can be compare the asserted information for another nucleic acid or checked . In some embodiments, the target alignment rate for reference value . the WES is more than 90 % ( e.g. , 91 % , 95 % , 99 % ) . In some [ 0296 ] In some embodiments, these determined features of embodiments, the target alignment rate for the RNA - seq is the sequence information are then evaluated ( e.g. , deter more than 90 % ( e.g. , 91 % , 95 % , 99 % ) . US 2021/0005284 A1 Jan. 7 , 2021 31

[ 0301 ] In some embodiments , base call quality scores such 4.5 , 3 , 2.5 ) . In some embodiments, the threshold for tumor as Phred score can be checked . In some embodiments, the RNA seq tissue versus normal WES tissue for the RNA seq target Phred score for the WES is more than 30 ( e.g. , 35 , 40 , is less than 5 ( e.g. , 4.5 , 3 , 2.5 ) . 50 ) . In some embodiments, the target Phred score for the [ 0310 ] In some embodiments, sequence information can RNA - seq is more than 30 ( e.g. , 35 , 40 , 50 ) . be assessed for genome contamination ( e.g. , non - human [ 0302 ] In some embodiments, uniformity of coverage can genome contamination ) . In some embodiments, the samples be checked . In some embodiments, the target uniformity of or sequence information are assessed to determine whether coverage for the WES is 85 % base pairs in target regions they are contaminated by determining whether they contain covered 20x for tumor tissue ( e.g. , 85 % , 95 % , 99 % ) . In sequences from other species or reference genomes such as some embodiments, the target uniformity of coverage for the mouse , zebrafish , drosophila , celegans, saccharomyces, WES is 85 % base pairs in target regions coveredz20x for arabidopsis, microbiome, mycoplasma, adapters, UniVec, normal tissue ( e.g. , 85 % , 95 % , 99 % ) . In some embodiments , and phiX rRNA . In some embodiments, the target threshold target regions for determining uniformity of coverage may for the ADA genomes contamination for the WES is more be ExonV7 target regions with the use of the coding regions than 60 ( e.g. , 65 , 70 , 80 ) . In some embodiments, the from CCDS ( consensus coding sequence ) genes . acceptable threshold for the ADA genomes contamination [ 0303 ] In some embodiments, GC bias can be checked . In for the WES is more than 40 ( e.g. , 45 , 60 , 80 ) . In some some embodiments , the target GC bias for the WES is at embodiments, the target threshold for the ADA genomes least 50 ( e.g. , 50 , 60 , 70 ) . In some embodiments, the contamination for the RNA seq is more than 40 ( e.g. , 50 , 60 , acceptable range of GC bias for the WES is at least 45-65 80 ) . In some embodiments , the acceptable threshold for the ( e.g. , 45-65 , 50-65 , 55-65 ) . In some embodiments , the target ADA genomes contamination for the RNA seq is more than GC bias for the RNA - seq is at least 50 ( e.g. , 50 , 60 , 70 ) . In 20 ( e.g. , 30 , 50 , 70 ) . some embodiments , the acceptable range of GC bias for the [ 0311 ] In some embodiments, only one feature is evalu RNA - seq is at least 45-65 ( e.g. , 45-65 , 50-65 , 55-65 ) . ated against an asserted information . In some embodiments , [ 0304 ] In some embodiments, mapping quality can be more than one feature is evaluated against an asserted checked . In some embodiments, the mapping quality for the information . In some embodiments, at least two or more WES is > 10 ( e.g. , 10 , 20 , 30 ) . features are evaluated against an asserted information . In [ 0305 ] In some embodiments , duplication rate can be some embodiments, at least three or more features are checked . In some embodiments, the duplication rate for the evaluated against an asserted information . In some embodi WES is less than 30 % ( e.g. , 29.9 % , 25 % , 15 % ) . In some ments, at least four or more features are evaluated against an embodiments , the duplication rate for the RNA - seq is less asserted information . In some embodiments , at least five or than 85 % ( e.g. , 84.99 % , 80 % , 70 % ) . more features are evaluated against an asserted information . [ 0306 ] In some embodiments , insert size can be checked . In some embodiments , at least six or more features are In some embodiments , the acceptable median insert size for evaluated against an asserted information . In some embodi tumor tissue for the WES is about 150 ( e.g. , 150 , 280 , 250 ) . ments, at least seven or more features are evaluated against In some embodiments , the target median insert size for an asserted information . In some embodiments, at least eight tumor tissue for the WES is about 200 ( e.g. , 200 , 250 , 350 ) . or more features are evaluated against an asserted informa In some embodiments , the acceptable median insert size for tion . In some embodiments, at least nine or more features are normal tissue for the WES is about 150 ( e.g. , 150 , 280 , 250 ) . evaluated against an asserted information . In some embodi In some embodiments, the target median insert size for ments , at least ten or more features are evaluated against an normal tissue for the WES is about 200 ( e.g. , 200 , 250 , 350 ) . asserted information . In some embodiments , at least eleven In some embodiments , the acceptable median insert size for or more features are evaluated against an asserted informa tumor tissue for the RNA seq is about 150 ( e.g. , 150 , 280 , tion . In some embodiments, at least twelve or more features 250 ) . In some embodiments , the target median insert size for are evaluated against an asserted information . In some tumor tissue for the RNA seq is about 200 ( e.g. , 200 , 250 , embodiments, at least thirteen or more features are evaluated 350 ). against an asserted information . In some embodiments, at [ 0307 ] In some embodiments , contamination can be least fourteen or more features are evaluated against an checked . In some embodiments, contamination acceptable asserted information . In some embodiments , at least fifteen for the WES is less than 0.05 % ( e.g. , 0.04 % , 0.03 % , 0.01 % ) . or more features are evaluated against an asserted informa In some embodiments, contamination acceptable for the tion . RNA - seq is less than 0.05 % ( e.g. , 0.04 % , 0.0 [ 0312 ] In some embodiments, if the features or determined [ 0308 ] In some embodiments, SNP concordance of a pair values are found to not meet or match the asserted infor of tumor versus normal samples from the same patient can mation , additional steps are performed . In some embodi be checked . In some embodiments, the target SNP concor ments, if the features or determined values are found to not dance for the WES is more than 90 % ( e.g. , 91 % , 95 % , 98 % ) . meet or match the asserted information , the sequence infor In some embodiments, the acceptable SNP concordance for mation is rejected ( e.g. , is not used for subsequent analysis ). the WES is more than 85 % ( e.g. , 86 % , 90 % , 98 % ) . In some In some embodiments , if the features or determined values embodiments, the target SNP concordance for the RNA seq are found to not meet or match the asserted information , the is more than 90 % ( e.g. , 91 % , 95 % , 98 % ) . In some embodi sequence information is retested, meaning that any evalua ments, the acceptable SNP concordance for the RNA seq is tion of features or determinations are performed for at least more than 85 % ( e.g. , 86 % , 90 % , 98 % ) . another or second , or more ( e.g. , third , fourth , fifth , sixth , [ 0309 ] In some embodiments, HLA allele concordance of etc. ) time . In some embodiments, if the features or deter a pair of tumor versus normal samples from the same patient mined values are found to not meet or match the asserted can be checked . In some embodiments , the threshold for information , another or second , or more ( e.g. , third , fourth , normal versus tumor tissue for the WES is less than 5 ( e.g. , fifth , sixth , etc. ) sequence information is obtained and then US 2021/0005284 Al Jan. 7 , 2021 32 tested , meaning that any evaluation of features or determi tissue of origin from which the nucleic acid obtained ; ( iii ) a nations are performed for at least one time , or second , or tumor type of from which the nucleic acid was obtained ; ( iv ) more ( e.g. , third , fourth , fifth , sixth , etc. ) time , independent a quality measure of the first RNA sequence data ; ( v ) of the initial determinations and evaluations done on the first whether the RNA sequence data was obtained from poly sequence information . In some embodiments, if the features adenylated (polyA ) RNA or total RNA ; ( vi ) if the first or determined values are found to not meet or match the sequence data set is first WES sequence data , ( vii ) the asserted information , the sequence information is reported to sequencing platform that was used to generate the first a user as such . In some embodiments , any combination of sequence data set ; and ( viii ) a quality measure of the first these steps may be performed in the event that features or sequence data set . determined values are found to not meet or match the [ 0316 ] In some embodiments, the method further com asserted information . In some embodiments, if the features prises obtaining additional sequence information if the one or determined values are found to not meet or match the or more features of the sequence information is below a asserted information , the sequence information can still be quality control threshold suitable for further analysis. evaluated for characteristics related to disease ( e.g. , cancer ), [ 0317 ] In some embodiments , the evaluated feature is the but information about the quality ( e.g. , the extent and nature subject identity. In some embodiments, the subject identity of the one or more features of the determined sequence is determined by performing one or more of evaluations information that do not match the asserted information ) can from the group comprising: a major histocompatibility com be provided to a user ( e.g. , to a physician or other medical plex evaluation and a SNP concordance evaluation , wherein practitioner ). In some embodiments, the characteristics the results of the evaluations are compared to an asserted relate to the type of cancer , its environment, its stage , its value for the subject or a second sequence data set from the location , its tissue of origin , its statistical likelihood of subject . responding to various treatments or therapies, or other [ 0318 ] In some embodiments , the evaluated feature is the properties which may aid a practitioner in treating the tissue of origin . In some embodiments, the tissue of origin subject. In some embodiments , if the features or determined is determined by performing one or more of evaluations values are found to meet or match the asserted information from the group comprising: protein expression and bio ( e.g. , match , exceed , or otherwise satisfy reference or thresh marker analysis . In another aspect , the disclosure relates to old values ) , then additional steps may be performed . In some a method of evaluating a feature comprising assigning a embodiments , if the features or determined values are found tissue of origin of the sample which generated the sequence to meet or match the asserted information ( e.g. , match , information . In some embodiments, the method comprises exceed , or otherwise satisfy reference or threshold values ) , evaluating the sequence information for markers or ne then additional steps may be performed . In some embodi expression indicative of a tissue type from which the ments , if the features or determined values are found to meet sequence information originated . In some embodiments, the or match the asserted information ( e.g. , match , exceed , or method comprises evaluating the markers or gene expres otherwise satisfy reference or threshold values ) , the sion against a database of the same for different tissue types . sequence information is evaluated for characteristics related Different tissues throughout a subject express different pro to cancer . In some embodiments, the characteristics relate to teins which create a profile of such a tissue . Accordingly, it the type of cancer, its environment, its stage , its location , its is possible to evaluate the protein expression profile and tissue of origin , its statistical likelihood of responding to match it to a tissue type to identify the tissue from which the various treatments or therapies, or other properties which sample, and by extension the sequence information , was may aid a practitioner in treating the subject. obtained . This can be done through a variety of methods [ 0313 ] In some embodiments, after one or more quality known in the art. For example, evaluating the number of a control steps are performed , a report is generated for the user given messenger RNA ( mRNA ) transcript ( e.g. , using it as with the results of the quality control steps that were a proxy for evaluating protein expression ) can be evaluated performed . against a database of known tissue markers ( e.g. , protein [ 0314 ] Accordingly, in one aspect , the disclosure relates to expression profiles ), it can be evaluated against a provided a method of evaluating sequence information of at least one set of markers for a subject, or it can be evaluated against a nucleic acid , to determine at least one feature thereof. The at second sequence information or set of tissue markers least one feature can be used to evaluate the quality or obtained from the subject. In some embodiments , a tissue of integrity of the sequence information , to interrogate the origin is determined by evaluating the sequence information source of the sequence information , or to allow for analyses for markers ( e.g. , protein expression ) and matching the of other sequence information , which may or may not be markers with a database of tissues . In some embodiments, a from the same sequencing platform , or from the same or a tissue of origin is determined by evaluating the sequence different sample preparation protocol. Further the at least information for markers ( e.g. , protein expression ) and one feature may be used as a quality control measure to matching the markers with a set of markers from a tissue of ensure subsequent analyses of a threshold quality and lower a subject. In some embodiments , a tissue of origin is quality sequence information are omitted . determined by evaluating the sequence information for [ 0315 ] Accordingly, in one aspect , the disclosure relates to markers ( e.g. , protein expression ) and matching the markers a method of evaluating sequence information by ( a ) obtain with a second sequence information obtained from a subject ing sequence information which comprises: ( 1 ) sequence where the tissue of origin is known . data from a first ribonucleic acid (RNA ); or ( 2 ) sequence [ 0319 ] In some embodiments , the evaluated feature is a data from a first whole exome sequence ( WES ) ; and ( b ) measure of the integrity of the sequence information . In determining one or more features of the sequence data some embodiments , the integrity measure of the first RNA selected from the group consisting of: ( i ) the identity of the sequence data is determined by performing one or more of subject from which the nucleic acid was obtained ; ( ii ) a evaluations from the group comprising: determining cover US 2021/0005284 A1 Jan. 7 , 2021 33 age of one or more genes in the RNA sequence data , threshold ) the sequence information from each of the nucleic determining relative coverage of two or more exons for at acid samples is deemed sufficiently likely from the same least one gene in the RNA sequence data, determining an source , of sufficient quality , and is retained for further expression ratio of two known reference genes from the analysis and / or reported as such to a user . RNA sequence data , or other feature or combinations of two [ 0323 ] In some embodiments, the quality of sequence or more thereof. In some embodiments, the integrity mea information from one or more ( e.g. , at least two) nucleic acid sure of the DNA sequence data is determined by performing samples is evaluated by determining a concordance value for one or more of evaluations from the group comprising : total single nucleotide polymorphisms ( SNPs ) in the sequence coverage and / or chromosomal coverage of the DNA information . In some embodiments , the method further sequence data , or other feature or combinations of two or comprises evaluating the concordance value . In some more thereof. embodiments, if the concordance value is less than 85 % , less [ 0320 ] In some embodiments, the RNA sequence data is than 80 % , or less than 75 % , the sequence information from analyzed to determine whether it was obtained from polyA each of the nucleic acid samples is deemed likely from RNA or total RNA . In some embodiments , the RNA distinct sources , of insufficient quality , and is removed , sequence data is analyzed by evaluating an expression level discarded , retested , and / or reported as such to a user. In of one or more mitochondrial or genes from the some embodiments , if the concordance value is less than RNA sequence data , and / or other features that are charac 75 % , the sequence information is deemed not acceptable . In teristic of polyA or total RNA . In some embodiments , the some embodiments, if the concordance value is more than feature being evaluated is the sequencing platform that was 80 % and less than 95 % , the sequence information is deemed used to generate the sequence . In some embodiments, the within the ranges that are close to be not acceptable . In some sequencing platform used for generating the WES sequence embodiments, if the concordance value is more than 95 % , data is determined by performing one or more of evaluations the sequence information is deemed acceptable . In some from the group comprising: determining % variance for one embodiments , if the concordance value is at least 75 % , at or more reference genes in the WES sequence data , or other least 80 % , or at least 85 % , the sequence information from property of sequencing data that is characteristic of the each of the nucleic acid samples is deemed sufficiently likely sequencing platform that was used to generate the sequence from the same source , of sufficient quality, and is retained , data . and / or reported as such to a user . In some embodiments , at [ 0321 ] In some embodiments, methods comprise evaluat least 5,000 SNPs can be evaluated for concordance values . ing at least one of the features described herein . In some In some embodiments, at least 6,000 SNPs can be evaluated embodiments, a method comprises evaluating at least two of for concordance values . In some embodiments, at least the features described herein . In some embodiments , a 7,000 SNPs can be evaluated for concordance values . In method comprises evaluating at least three of the features some embodiments, at least 8,000 SNPs can be evaluated for described herein . In some embodiments , a method com concordance values . prises evaluating at least four of the features described [ 0324 ] In some embodiments, the quality of sequence herein . In some embodiments , a method comprises evalu information from one or more ( e.g. , at least two) nucleic acid ating at least five of the features described herein . In some samples is evaluated by determining a contamination value embodiments, a method comprises evaluating at least six of for the sequence information . In some embodiments, if the the features described herein . In some embodiments, a contamination value is above a statistically significant method comprises evaluating at least seven of the features threshold , the sequence information is removed , discarded , described herein . retested , and / or reported as such to a user. In some embodi [ 0322 ] In some embodiments , the quality ( e.g. , source or ments , if the contamination value is more than 0.05 % ( e.g. , integrity ) of sequence information from one or more nucleic 0.06 % , 1 % , 2 % ) , the sequence information is deemed to be acid samples ( e.g. , at least two nucleic acid samples ) is close to not acceptable ( e.g. , warning ). In some embodi evaluated by ( a ) determining the sequence of two or more ments, if the contamination value is more than 0.1 % ( e.g. , ( e.g. , 2 , 3 , 4 , 5 , 6 or more ) major histocompatibility com 0.1 % , 0.5 % , 1 % ) , the sequence information is deemed to be plexes ( MHCs ) , and ( b ) determining whether the MHCS not acceptable for blood sample and fresh frozen tissue . In from the one or more samples match . In some embodiments , some embodiments, if the contamination value is below the if the MHCs don't match ( e.g. , if a calculated agreement threshold , the sequence information is retained , and / or value is less than a statistically significant threshold ) the reported as such to a user. sequence information from each of the nucleic acids is [ 0325 ] In some embodiments, the quality of sequence deemed likely from distinct sources , of insufficient quality , information from one or more ( e.g. , at least two ) nucleic acid and is removed , discarded , retested, and / or reported as such samples is evaluated by analyzing sequence information to a user . In some embodiments, if the calculated agreement from the one or more nucleic acid samples against a set of value ( x ) between WES normal/ tumor /RNAseq is 0 < x < 2 tumor types , determining predicted tumor type (s ) from the ( e.g. , 1 , 1.5 , 2 ) , it represents acceptable and “ warning. ” sequence information , and determining whether the pre Warning means that the calculated agreement value is within dicted tumor type ( s) matches the tumor type ( s ) that were a range that is deemed acceptable but is considered close to provided ( e.g. , asserted ) for the one or more nucleic acid be not acceptable . In some embodiments , if the calculated samples. In some embodiments , determining predicted agreement value ( x ) between WES normal/ tumor /RNAseq tumor type ( s ) as a quality control step can be performed by is > 5 , it represents not acceptable or bad quality . In some using a computerized system or process as described herein . embodiments, if the calculated agreement value ( x ) between In some embodiments , determining predicted tumor type ( s ) WES normal /tumor /RNAseq is 0 , it represents good quality. as a quality control step can be performed by determining a In some embodiments, if the MHCs match ( e.g. , if the cancer grade from sequence data by using machine learning agreement value is at or above the statistically significant techniques, as described herein and in the U.S. Provisional US 2021/0005284 A1 Jan. 7 , 2021 34

Patent Application Ser. No. 62 / 943,976 , titled " Machine determining the tissue of origin and another set of genes may Learning Techniques for Gene Expression Analysis , " filed be used for determining the cancer grade . on Dec. 5 , 2019 , which is incorporated by reference herein [ 0331 ] In some embodiments, the expression data may be in its entirety. In some embodiments, if there is disagreement obtained for cells in the biological sample, where the subject between the tumor type ( s ) ( e.g. , cancer grades) obtained has , is suspected of having or is at risk of having cancer . In from the sequence evaluation and an asserted information , the context where tissue of origin is a characteristic being the sequence information is identified as suspect , or insuf determined , the tissue of origin is for the cells in the ficient quality , and is removed , discarded , retested , and / or biological sample . The tissue of origin may refer to a reported as such to a user . In some embodiments , if there is particular tissue type from which the cells originate, such as agreement between the predicted tumor type ( s ) and expected lung, pancreas , stomach, colon , liver, bladder, kidney, thy tumor type ( s ) for the one or more nucleic acid samples , the roid , lymph nodes , adrenal gland, skin , breast, ovary, and sequence information is deemed of sufficient quality, and is prostate. retained , and / or reported as such to a user . [ 0332 ] For example , some embodiments involve using a [ 0326 ] In some embodiments, matching the predicted gene set for predicting tissue of origin , which may include tumor type ( s ) to the tumor type ( s ) that were provided cell of origin , for Diffuse Large B - Cell Lymphoma comprise using a set of reference genes from a training data ( DLBCL ) , such as germinal center B - cell ( GCB ) and acti set containing a plurality of signature genes that are up vated B - cell ( ABC ) . Genes in the gene set may be selected regulated or down -regulated in a certain tumor type , relative from the group consisting of : ITPKB , MYBL1, LMO2 , to a normal, healthy sample. For instance , if the predicted BATF, IRF4 , LRMP, CCND2 , SLA , SP140 , PIM1 , CSTB , tumor type is prostate cancer ( e.g. , asserted information ), the BCL2 , TCF4 , P2RX5 , SPINK2 , VCL , PTPN1 , REL , FUT8 , sample will be checked against the known reference genes RPL21 , PRKCB1 , CSNKIE , GPR18 , IGHM , ACP1 , SPIB , of prostate cancer . In some embodiments, the predicted HLA - DQA1 , KRT8 , FAM3C , and HLA - DMB . tumor type is also evaluated against its tumor grade, which [ 0333 ] In the context where cancer grade is a characteristic can help determine the signature genes of the asserted cancer being determined , the cancer grade is for the cells in the grade at different stages of cancer . biological sample. The cancer grade may refer to prolifera [ 0327 ] As described above , in some embodiments, deter tion and differentiation characteristics of the cells in the mining predicted tumor type ( s ) as a quality control step may biological sample and refer to a numerical grade that is be performed by determining a cancer grade from sequence generally determined by visual observation of cells using information by using a machine learning approach employ microscopy, such as Grade 1 , Grade 2 , Grade 3 , and Grade ing a statistical model trained using training data . [ 0328 ] For example, in some embodiments , a statistical [ 0334 ] For example , some embodiments involve using a model may be used to predict characteristic ( s ) of a biologi gene set for predicting breast cancer grade . Genes in the cal sample , using gene expression data, based on an input gene set may be selected from the group consisting of: ranking of genes , ranked based on their respective expres UBE2C , MYBL2 , PRAME , LMNB1, CXCL9 , KPNA2, sion levels, for a sequencing platform . Using the input TPX2 , PLCH1 , CCL18 , CDK1 , MELK , CCNB2 , RRM2 , ranking ( s ), instead of the specific values for the expression CCNB1 , NUSAP1, SLC7A5 , TYMS , GZMK , SQLE , levels , allows for the same or similar data processing pipe Clorf106 , CDC125B , ATAD2, QPRT, CCNA2 , NEK2 , line to be used across different expression data regardless of IDO1 , NDC80 , ZWINT, ABCA12 , TOP2A , TDO2 , the specific manner in which the expression levels were S100A8 , LAMP3 , MMP1 , GZMB , BIRC5 , TRIP13 , obtained ( e.g. , regardless of which sequencing platform , RACGAP1, ASPM , ESRP1 , MAD2L1 , CENPF, CDC20 , sequencing conditions, sample preparation , data processing MCM4 , MK167 , PBK , CKS2 , KIF2C , MRPL13 , TTK , to obtain expression levels , etc. ). In some embodiments , a BUB1 , TK1 , FOXM1 , CEP55 , EZH2 , ECT2 , PRCI , statistical model may be used to predict cancer grade of the CENPU , CCNE2 , AURKA , HMGB3 , APOBEC3B , biological sample. In some embodiments, a statistical model LAGE3 , CDKN3, DTL , ATP6V1C1, KIAA0101, CD2 , may be used to predict tissue of origin of the biological KIF11 , KIF20A , CDCA8 , NCAPG , CENPN , MTFR1 , sample , which also may be used for performing quality MCM2 , DSCC1 , WDR19 , SEMA3G , KCND3 , SETBP1 , control as described herein . KIF13B , NR4A2 , NAV3, PDZRN3, MAGI2 , CACNA1D , [ 0329 ] For example , in some embodiments, rankings of STC2 , CHAD , PDGFD , ARMCX2 , FRY, AGTR1 , genes based on the gene expression levels ( in a biological MARCHS, ANG , ABAT, THBD , RAI2 , HSPA2, ERBB4 , sample) as determined by a sequencing platform may be CHDC2, FST EPHX2 , FOSB , STA 13 , ID4 , provided as input to a statistical model trained to predict FAM129A , FCGBP, LAMA2, FGFR2 , PTGER3 , NME5 , tissue of origin for the biological sample . The predicted LRRC17 , OSBPLIA , ADRA2A , LRP2 , Clorf115 , tissue of origin may be compared against asserted tissue of COL4A5 , DIXDC1 , KIAA1324 , HPN , KLF4 , SCUBE2 , origin as part of the quality control techniques described FMO5 , SORBS2 , CARD10 , CITED2 , MUC1, BCL2 , herein . As another example, in some embodiments, rankings RGS5 , CYBRD1 , OMD , IGFBP4 , LAMB2 , DUSP4 , of genes based on the gene expression levels ( in a biological PDLIM5 , IRS2 , and CX3CR1 . sample ) as determined by a sequencing platform may be [ 0335 ] As another example, some embodiments involve provided as input to a statistical model trained to predict using a gene set for predicting kidney clear cell cancer cancer grade for the biological sample. The predicted cancer grade . Genes in the gene set may be selected from the group grade may be compared against asserted cancer grade as part consisting of: PLTP, CIS , LY96 , TSKU , TPST2 , SER of the quality control techniques described herein . PINF1 , SRPX2 , SAA1, CTHRC1 , GFPT2 , CKAP4 , SER [ 0330 ] In some embodiments, the set of genes being PINA3, CFH , PLAU , BASP1 , PTTG1 , MOCOS , LEF1 , ranked depends on the particular biological characteristic of SLPI , PRAME, STEAP3 , LGALS2 , CD44 , FLNC , UBE2C , interest. For example , one set of genes may be used for CTSK , SULF2 , TMEM45A , FCGR1A , PLOD2 , C19orf80 , US 2021/0005284 A1 Jan. 7 , 2021 35

PDGFRL , IGF2BP3 , SLC7A5 , PRRX1, RARREST , [ 0340 ] In some embodiments, the quality of sequence LHFPL2 , KDELR3 , TRIB3 , IL20RB , FBLN1 , KMO , CIR , information from one or more ( e.g. , at least two ) nucleic acid CYP1B1 , KIF2A , PLAUR , CKS2 , CDCP1 , SFRP4 , HAMP, samples is evaluated by ( a ) determining a gene expression MMP9 , SLC3A1 , NATO , FRMD3 , NPR3 , NAT8B , BBOX1 , level for two different subunits of a known protein ; and ( b ) SLC5A1 , GBA3 , EMCN , SLC47A1 , AQP1 , PCK1 , determining an expression ratio for the two different sub UGT2A3 , BHMT, FMO1 , ACAA2 , SLC5A8 , SLC16A9 , units . In some embodiments, if the determined expression TSPAN18 , SLC17A3 , STK32B , MAP7, MYLIP , ratio does not match the expected expression ratio for the SLC22A12 , LRP2, CD34 , PODXL , ZBTB42 , TEK , FBP1 , protein subunits , the sequence information is identified as and BCL2 . suspect , of insufficient quality, and is removed , discarded , [ 0336 ] Aspects of using statistical models for predicting retested , and / or reported as such to a user. In some embodi tissue of origin , cancer grade , and / or any other characteris ments , if the determined expression ratio matches an tics of a biological sample are described in the U.S. Provi expected expression ratio for the protein subunits , the sional Patent Application Ser . No. 62 / 943,976 , titled sequence information is deemed of sufficient quality for “ Machine Learning Techniques for Gene Expression Analy further analysis, and is retained , and / or reported as such to sis ” , filed on Dec. 5 , 2019 , which is incorporated by refer a user . ence herein in its entirety . [ 0341 ] In some embodiments , the quality of sequence [ 0337 ] Returning to aspects of evaluating quality of information from one or more ( e.g. , at least two) nucleic acid sequence information , in some embodiments, the quality of samples is evaluated by determining a Phred Score for the sequence information from one or more ( e.g. , at least two ) sequence information. In some embodiments, if the Phred nucleic acid samples is evaluated by determining the pres Score is less than 27 , the sequence information is deemed ence or absence of polyadenylated RNA genes to predict suspect , of insufficient quality, and is removed , discarded , whether the sequence information was obtained from polyA retested , and / or reported as such to a user . In some embodi RNA or not . In some embodiments , if there is disagreement ments, if the Phred Score is less than 20 , the sequence between the predicted and expected ( e.g. , asserted ) polyA information is removed , discarded , retested , and / or reported status of one or more samples, the sequence information for as such to a user . In some embodiments , if the Phred Score those samples is deemed as suspect , of insufficient quality, is more than 20 and is less than 27 , the sequence information and is removed , discarded , retested , and / or reported as such is deemed close to be removed , discarded , retested , and /or to a user. In some embodiments, if there is agreement reported as such to a user . In some embodiments , if the between the predicted and expected polyA status for one or Phred Score is at least 27 , the sequence information is more nucleic acid samples, the sequence information is deemed of sufficient quality for further analysis , and is deemed of sufficient quality , and is retained , and / or reported retained , and / or reported as such to a user . as such to a user . [ 0342 ] In some embodiments, the quality of sequence [ 0338 ] In some embodiments, the quality of sequence information from one or more ( e.g. , at least two ) nucleic acid information from one or more ( e.g. , at least two) nucleic acid samples is evaluated by determining a GC content for the samples is evaluated by determining a complexity value of sequence information . In some embodiments, if the GC the sequence information . In some embodiments , determin content is at least 30 % , and less than or equal to 55 % , the ing a complexity value comprises determining the number of sequence information is deemed of sufficient information for duplications. In some embodiments, the % duplications can further analysis, and is retained , and / or reported as such to be determined for a DNA or RNA library . In some embodi a user. In some embodiments, if the GC content is in the ments, if a large percentage of the library is duplicated , range of 45-65 % , the sequence information is deemed of either a library of low complexity or over - amplification of sufficient information for further analysis , and is retained , the DNA or cDNA fragments is indicated . In some instances , and /or reported as such to a user ( i.e. , acceptable ). In some differences between libraries in the complexity or amplifi embodiments , GC content of at least 50 % ( e.g. , 50 % , 51 % , cation indicates that certain biases in the data are introduced 60 % ) is the target value at least for human samples . ( e.g. , differing % GC content ). In some embodiments , if the [ 0343 ] In some embodiments, at least two ( e.g. , 3 , 4 , 5 , 6 , complexity value is less than 75 % , or less than 80 % , the 7 , 8 , 9 , 10 , or more ) distinct methods for evaluating the sequence information is deemed suspect , of insufficient quality ( e.g. , source and / or integrity ) of sequence informa quality , and is removed , discarded , retested , and / or reported tion are performed . In some embodiments, the methods as such to a user . In some embodiments , if the complexity performed herein evaluate sequence information from a value is at least 80 % , or at least 85 % , the sequence infor mammal . In some embodiments , the mammal is human . mation is deemed of sufficient quality for further analysis , [ 0344 ] In some embodiments, the subject from which the and is retained , and / or reported as such to a user . sample which generated the sequence information has , is [ 0339 ] In some embodiments, the quality of sequence suspected of having, or is at risk of having a disorder. In information from one or more ( e.g. , at least two) nucleic acid some embodiments , the disorder is cancer . samples is evaluated by predicting a tissue source for the [ 0345 ] In some embodiments, a report is generated com nucleic acid . In some embodiments, if there is disagreement prising the one or more features or results of the methods between a predicted and an asserted tissue source for the described herein . In some embodiments , the report further nucleic acid , the sequence information is deemed suspect , of comprises an analysis of the results of the methods described insufficient quality , and is removed , discarded , retested , or herein . reported as such . In some embodiments , if there is agree [ 0346 ] In some embodiments, the methods or processes of ment between the predicted and asserted tissue sources , the the disclosure may be carried out on a system or computer sequence information is deemed of sufficient quality for processor ( e.g. , laptop , desktop, server , or other computer further analysis, and is retained , and / or reported as such to ized machine ) . The components of the system may reside in a user. disparate places and communicate over networks, such as US 2021/0005284 A1 Jan. 7 , 2021 36 local area networks or wide area networks, or by internet [ 0356 ] Next, process 210 proceeds to act 213 , where protocols. The system may interface with the user via a non - coding transcripts from the annotated RNA expression web - enable browsers and graphical user interfaces ( GUI ) . data are removed to obtain filtered RNA expression data . In some embodiments, the system is under the control of the Aspects relating to removing non - coding transcripts from user in one place . In some embodiments, the system is the annotated RNA expression data are described in the comprised of components not in one place and which may section titled " Removing non - coding transcripts ." not be under the direct control of the user . In some embodi [ 0357 ] Next, process 210 proceeds to act 214 , where the ments , the information of the system is stored locally . filtered RNA expression data is normalized to obtain gene [ 0347 ] As described herein , the term “ process , " " act, ” expression data . The gene expression data may be in tran “ step ” or the variations thereof that are used in computerized scripts per kilobase million ( TPM ) format. Aspects of nor processes or flow charts therein can be used interchangeably , malizing the filtered RNA expression data to gene expres unless indicated otherwise . sion data in transcripts per kilobase million ( TPM ) are [ 0348 ] As described herein , the term “ patient , ” “ subject, ” described in the section called “ Conversion to TPM and " human subject” or the variations thereof can be used gene aggregation . ” interchangeably, unless indicated otherwise . [ 0358 ] Next, process 210 proceeds to act 215 , where at [ 0349 ] FIG . 6A is a flow chart showing illustrative com least one gene that introduces bias in the gene expression puterized Process 200 for performing non - stranded RNA data is identified . Aspects of identifying at least one gene sequencing with the coding RNA enrichment. Process 200 that introduces bias in the gene expression data are described begins at act 201 , where a first sample of a first tumor from in the section called “ Removing bias . ” a subject having, suspected of having, or at risk of having [ 0359 ] Next, process 210 proceeds to act 216 , where cancer is obtained . Further aspects relating to obtaining a expression data associated with the at least one gene that first sample of a first tumor from a subject having , suspected introduces bias is removed from the gene expression data to of having, or at risk of having cancer are provided in section obtain bias - corrected gene expression data . Aspects of “ Biological samples . " removing the expression data , associated with the at least [ 0350 ] Next process 200 proceeds to act 202 , wherein one gene that introduces bias into the gene expression data , RNA from the first sample of the first tumor is extracted . from the gene expression data to obtain bias - corrected gene Aspects relating to extracting RNA from the first sample of expression data are described in the section called “ Remov the first tumor are described in the section called “ Extraction ing bias .” of DNA and / or RNA .” [ 0360 ] Next, process 210 proceeds to act 217 , where a [ 0351 ] Next the process 200 proceeds to act 203 , where cancer treatment for the subject using the bias - corrected the extracted RNA is enriched for coding RNA to obtain gene expression data is identified . Aspects relating to iden enriched RNA . Aspects relating to enriching the extracted tifying a cancer treatment for the subject using the bias RNA for coding RNA to obtain enriched RNA are described corrected gene expression data are described in the section in the section called " RNA enrichment. ” called “ Identifying a cancer treatment. ” [ 0352 ] Next process 200 proceeds to act 204 , where a first [ 0361 ] FIG . 6C is a flow chart showing computerized library of cDNA fragments from the enriched RNA for process 220 for identifying a cancer treatment for the subject non - stranded RNA sequencing is prepared . Aspects relating having, suspected of having, or at risk of having cancer to preparing a first library of DNA fragments from the using the bias -corrected gene expression data . Process 220 enriched RNA for non - stranded RNA sequencing are begins at act 221 , where RNA for coding RNA in a sample described in the section called “ Library preparation for RNA of extracted RNA from a first tumor sample from a subject sequencing . " having, suspected of having, or at risk of having cancer is [ 0353 ] Next process 200 proceeds to act 205 , where non enriched . Aspects relating to enriching RNA for coding stranded RNA sequencing is performed on the first library of RNA in a sample of extracted RNA are described in the cDNA fragments prepared from the enriched RNA . Aspects section called “ Extraction of DNA and / or RNA .” relating to performing non - stranded RNA sequencing on the [ 0362 ] Next, process 220 proceeds to act 222 , where first library of DNA fragments prepared from the enriched non - stranded RNA sequencing on a first library of cDNA RNA are described in the section called “ RNA sequencing . ” fragments prepared from the enriched RNA to obtain RNA It should be appreciated that one or more acts of process 200 expression data is performed . Aspects relating to performing may be optional . non - stranded RNA sequencing on a first library of cDNA [ 0354 ] FIG . 6B is a flow chart showing computerized fragments prepared from the enriched RNA to obtain RNA process 210 for identifying a cancer treatment by obtaining expression data are described in the section called “ RNA bias - corrected gene expression data . Process 210 begins at sequencing ." act 211 , where RNA expression data for a subject having, [ 0363 ] Next, process 220 proceeds to act 223 , where the suspected of having, or at risk of having cancer is obtained . RNA expression data is converted to gene expression data . Aspects relating to obtaining RNA expression data are Next, process 220 proceeds to act 224 , where at least one described in the section called “ Obtaining RNA expression gene that introduces bias in the gene expression data is data . ” identified . Next , process 220 proceeds to act 225 , where [ 0355 ] Next , process 210 proceeds to act 212 , where genes expression data associated with the at least one gene that in the RNA expression data are aligned to a reference and the introduces bias is removed from the gene expression data to RNA expression data is annotated . Aspects relating to align obtain bias - corrected gene expression data . Aspects relating ing and annotating genes in the RNA expression data with to acts 223 , 224 , and 225 are described in the section called known sequences of the human genome to obtain annotated " Removing bias. " RNA expression data are described in the section called [ 0364 ] Next, process 220 proceeds to act 226 , where a “ Alignment and annotation .” cancer treatment for the subject using the bias - corrected US 2021/0005284 A1 Jan. 7 , 2021 37 gene expression data is identified . Aspects relating to iden [ 0368 ] FIG . 8 illustrates a non - limiting process pipeline tifying a cancer treatment for the subject using the bias 800 FIG . 8 illustrates a non - limiting process pipeline 800 for corrected gene expression data are described in the section processing and validating sequence data and asserted infor called “ Identifying a cancer treatment . ” mation associated with the sequence data for subsequent [ 0365 ] FIG . 7 is an exemplary flow chart showing a analysis ( e.g. , for diagnostic , prognostic, therapeutic, and / or computerized process 300 for preparing patient samples for other clinical applications ). Act 801 is performed by obtain sequencing analysis and performing bioinformatics quality ing nucleic acid data comprising sequence data and asserted control, so that a cancer treatment suitable for the patient or information indicating an asserted source for the sequence subject, from which the nucleic acids are extracted for data . In some embodiments, the nucleic acid data is obtained sequencing analysis, can be obtained . from a biological sample that was previously processed . In some embodiments , the biological sample was previously [ 0366 ] In the illustrated embodiment, the process 300 obtained from a subject having, suspected of having, or at comprises obtaining a first sample of a first tumor from a risk of having cancer . In some embodiments, act 801 is subject having , suspected of having, or at risk of having performed by obtaining nucleic acid data comprising an cancer at act 301 , extracting RNA from the first sample of asserted integrity of the sequence data . In some embodi the first tumor at act 302 , enriching the RNA for coding ments, act 801 can be performed by obtaining nucleic acid RNA to obtain enriched RNA at act 303 , preparing a first data comprising sequence data and asserted information library of cDNA fragments from the enriched RNA for indicating an asserted source and an asserted integrity of the non - stranded RNA sequencing at act 304 , obtaining RNA sequence data . In some embodiments, the asserted informa expression data for the subject at act 305 , aligning and tion indicates the asserted integrity of the sequence data . In annotating genes in the RNA expression data with known some embodiments, the asserted information is indicative of sequences of the human genome to obtain annotated RNA a subject from whom the nucleic acid was obtained . For expression data at act 306 , removing non - coding transcripts example , in some embodiments, asserted information com from the annotated RNA expression data at act 307 , con prises MHC allele information and / or SNP information for verting the annotated RNA expression data to gene expres one or more loci of the subject. After act 801 , process 800 sion data ( e.g. , in transcripts per kilobase million ( TPM ) proceeds to acts 802 and 803 where the nucleic acid data format) at act 308 , identifying at least one gene that intro obtained at act 801 is validated . The validation comprises duces bias in the gene expression data at act 309 , removing processing the sequence data at act 802 to obtain a deter expression data for the at least one gene that introduces bias mined integrity and /or a determined source and determining from the gene expression data to obtain bias - corrected gene whether the determined integrity and / or determined source expression data at act 310 , obtaining sequence information matches the asserted integrity and / or asserted source , and asserted information at act 311 , determining one or more respectively, in act 803. The sequence data is processed in features from the sequence information at act 312 , deter act 802 to obtain determined information indicating a deter mining whether one or more features match asserted infor mined source of the sequence data in act 802a and / or mation at act 313 , making at least one additional determi determined information indicating a determined integrity of nation of the features at act 314 , and identifying a cancer the sequence data in act 802b . In some embodiments , act treatment for the subject using the bias - corrected gene 802a may comprise determining information indicative of at expression data at act 315 . least one , two, three of the MHC genotype of the subject, [ 0367 ] It should be appreciated that one or more acts of whether the nucleic acid data is RNA data or DNA data , a process 300 may be optional . For example , in some embodi tissue type of the biological sample , a tumor type of the ments, acts 301 and 303 may be performed and act 303 is biological sample, a sequencing platform used to generate optional . In some embodiments, acts 301 , 302 , and 303 are the sequence data , SNP concordance ( e.g. , determining all performed . In some embodiments, acts 301 , 302 , and 303 whether one or more SNPs in the sequence data match one are all omitted, whereas the remaining acts are performed . or more SNPs in a reference sequence ), and / or a whether an This may be useful when the extracted , enriched RNA from RNA sample is polyA enriched . In some embodiments, act the patient sample is already available prior to the start of 802b may comprise determining a first level of a first nucleic process 300. In some embodiments, the one or more features acid encoding a first subunit of a multimeric protein , deter at act 312 comprises one or more of the following features: mining a second level of a second nucleic acid encoding a source , patient, tissue type, tumor type, polyA status , MHC second subunit of a multimeric protein , and determining sequence , protein subunit ratio , complexity, contamination , whether a ratio between the first level and the second level coverage , exon coverage , read composition , Phred score , matches an expected ratio . In some embodiments the first SNP concordance, and GC content. In some embodiments , subunit and the second subunits are first and second CD3 the one or more features at act 312 further comprise strand subunits, first and second CD8 subunits, or first and second edness of RNA sequence analysis . In some embodiments, CD79 subunits. In some embodiments , the determined infor any one or more of the features at act 312 can be determined . mation indicative of the determined integrity is indicative of In some embodiments, the additional determination of the at least one , two, three of total sequence coverage , exon features at act 314 can include but are not limited to coverage , chromosomal coverage, a ratio of nucleic acids concordance value of SNPs , contamination value , polyA encoding two or more subunits of a multimeric protein , status, complexity value , Phred score , and GC content. In species contamination , complexity, and / or guanine ( G ) and some embodiments, the additional determination of any one cytosine ( C ) percentage ( % ) of the sequence data . In some or more of the features may be performed at act 314. In some embodiments, act 803 comprises determining one or more embodiments, any one or more of acts 303 , process 307 , and MHC allele sequences from the sequence data and deter process 314 may be omitted . In some embodiments, all acts mining whether the one or more MHC alleles sequences of the computerized process 300 may be performed . match the asserted MHC allele information for the subject. US 2021/0005284 A1 Jan. 7 , 2021 38

In some embodiments, determining MHC allele comprises mine whether it is at an expected expression level based on determining sequences for six MHC loci from the sequence the expected cell , tissue , or tumor that is being analyzed . data . [ 0376 ] In some embodiments, expression levels of one or [ 0369 ] In act 803 the determined integrity and / or source is more genes are analyzed for each of a plurality of samples evaluated by determining whether the determined source of ( e.g. , 2 , 3 , 4 , 5 , 4-10 , 1-50 , 50-500 , or more samples ). If the the sequence data matches the asserted source of the expression of one or more genes is lower or higher than sequence data and / or whether the determined integrity of the expected , this may be indicative that the quality and / or sequence data matches the asserted integrity of the sequence source / origin of the data being analyzed is not what was data . expected . In some embodiments, data from a sample that has [ 0370 ] If the asserted and determined information match an unexpected level ( e.g. , a lower or higher than expected in act 803 ( i.e. , yes ) , process 800 proceeds to act 804 where level ) of expression for one or more genes is excluded from the sequence data is further evaluated to determine whether further analysis . In some embodiments , new sequence infor the sequence data is indicative a diagnostic, prognostic , mation is obtained for a sample that has an unexpected level therapeutic, or other clinical outcome. For example, in some of expression for one or more genes, for example to confirm embodiments the sequence data is further processed in act whether the initial data was correct . In some embodiments, 804 to provide a recommendation for a cancer treatment for a sample that has an unexpected level of expression for one a subject having, suspected of having , or at risk of having or more genes can be further analyzed , for example to cancer. In some embodiments, Act 804 is performed by determine whether the sample was from a different source determining a therapy for the subject and the therapy is than initially indicated . subsequently administered to the subject. [ 0377 ] In some embodiments, expression levels for one or more genes were analyzed ( e.g. , using tSNE , PCA , or other [ 0371 ] In some embodiments, a process may further com technique) to determine whether gene expression or patterns prise administering the therapy to the subject. In some of gene expression were similar or different in separate embodiments, the therapy is a cancer therapy . samples. In some embodiments, if datasets comprising the [ 0372 ] In some embodiments, determining the therapy for same cell type or same tissue type did not cluster within a the subject may include determining a plurality of gene group , or if one or more datasets were identified as statis group expression levels comprising a gene group expression tically different from other datasets comprising the same level for each gene group in a set of gene groups . In some cells or tissue , then the dataset ( s ) identified as different could embodiments, the set of gene groups comprises at least one be excluded , further analyzed , or flagged as potentially gene group associated with cancer malignancy, and at least suspect . In some embodiments , additional sequence data can one gene group associated with cancer microenvironment. be obtained for a sample identified as potentially suspect . The therapy for the subject is identified by using the deter [ 0378 ] An illustrative implementation of a computer sys mined gene group expression levels . tem 500 that may be used in connection with any of the [ 0373 ] If the asserted and determined information do not embodiments of the technology described herein is shown in match in act 803 ( i.e. , no ), process 800 proceeds to 805 , FIG . 9 The computer system 500 includes one or more where one or more remedial action ( s ) are performed . In processors 510 and one or more articles of manufacture that some embodiments, a remedial action comprises generating comprise non - transitory computer -readable storage media an indication that the determined information does not ( e.g. , memory 520 and one or more non -volatile storage match the asserted information , generating an indication to media 530 ) . The processor 510 may control writing data to not process the sequence data in a subsequent analysis , and reading data from the memory 520 and the non - volatile and / or generating an indication to obtain additional storage device 530 in any suitable manner, as the aspects of sequence data and / or other information about the biological the technology described herein are not limited in this sample and / or the subject. respect. To perform any of the functionality described [ 0374 ] In some embodiments, a method comprises all acts herein , the processor 510 may execute one or more proces illustrated in FIG . 8. However, in some embodiments, a sor - executable instructions stored in one or more non subset of the acts is performed and any one or more of the transitory computer - readable storage media ( e.g. , the acts may be omitted , duplicated , and / or performed in a memory 520 ) , which may serve as non -transitory computer different order than illustrated in FIG . 8. For example , either readable storage media storing processor - executable instruc act 802a or act 802b is performed in act 802. For example , tions for execution by the processor 510 . act 803 can be performed twice to confirm the decision . For [ 0379 ] Computing device 500 may also include a network example, one or more acts in process 800 can be performed input / output ( I / O ) interface 540 via which the computing after the one or more remedial action ( s ) in act 805. In some device may communicate with other computing devices embodiments, one or more acts of FIG . 8 are implemented ( e.g. , over a network ), and may also include one or more on a computer. user I / O interfaces 550 , via which the computing device may [ 0375 ] In some embodiments, expression levels of one or provide output to and receive input from a user . The user I / O more genes in a sample are analyzed to evaluate the origin interfaces may include devices such as a keyboard , a mouse , and / or quality of the sample. For example, the expression of a microphone , a display device ( e.g. , a monitor or touch one or more genes that are known to be expressed in a screen ) , speakers, a camera , and / or various other types of particular cell , tissue , or tumor type is evaluated to deter I / O devices . mine whether it is at an expected expression level based on [ 0380 ] The embodiments described herein , can be imple the expected cell , tissue , or tumor that is being analyzed . mented in any of numerous ways . For example, the embodi Similarly, the expression of one or more genes that are ments may be implemented using hardware, software or a known not to be expressed ( or not highly expressed ) in a combination thereof. When implemented in software , the particular cell , tissue , or tumor type is evaluated to deter software code can be executed on any suitable processor US 2021/0005284 A1 Jan. 7 , 2021 39

( e.g. , a microprocessor ) or collection of processors, whether and / or any other suitable type of network . Any of the devices provided in a single computing device or distributed among shown in FIG . 10 may connect to the network 610 using one multiple computing devices. It should be appreciated that or more wired links, one or more wireless links, and / or any any component or collection of components that perform the suitable combination thereof. functions described above can be generically considered as [ 0386 ] In the illustrated embodiment of FIG . 10 , the at one or more controllers that control the above - described least one database 620 may store expression data and or functions. The one or more controllers can be implemented sequence information for the subject ( e.g. , patient ), medical in numerous ways , such as with dedicated hardware, or with history data for the subject ( e.g. , patient ), test result data for general purpose hardware ( e.g. , one or more processors ) that the subject ( e.g. , patient) , and /or any other suitable infor is programmed using microcode or software to perform the mation about the subject 680. Examples of stored test result functions recited above . data for the subject ( e.g. , patient) include biopsy test results , [ 0381 ] In this respect, it should be appreciated that one imaging test results ( e.g. , MRI results ) , and blood test implementation of the embodiments described herein com results . The information stored in at least one database 620 prises at least one computer - readable storage medium ( e.g. , may be stored in any suitable format and / or using any RAM , ROM , EEPROM , flash memory or other memory suitable data structure ( s ), as aspects of the technology technology, CD - ROM , digital versatile disks ( DVD ) or other described herein are not limited in this respect. The at least optical disk storage , magnetic cassettes, magnetic tape , one database 620 may store data in any suitable way ( e.g. , magnetic disk storage or other magnetic storage devices, or one or more databases , one or more files ). The at least one other tangible , non -transitory computer -readable storage database 620 may be a single database or multiple databases . medium ) encoded with a computer program ( i.e. , a plurality [ 0387 ] As shown in FIG . 10 , illustrative environment 600 of executable instructions) that, when executed on one or includes one or more external databases 620 , which may more processors , performs the above -described functions of store information for patients other than patient 680. For one or more embodiments . The computer - readable medium example, external databases 660 may store expression data may be transportable such that the program stored thereon and / or sequence information ( of any suitable type ) for one or can be loaded onto any computing device to implement more patients, medical history data for one or more patients, aspects of the techniques described herein . In addition , it test result data ( e.g. , imaging results , biopsy results , blood should be appreciated that the reference to a computer test results ) for one or more patients, demographic and /or program which , when executed , performs any of the above biographic information for one or more patients, and / or any described functions, is not limited to an application program other suitable type of information . In some embodiments , running on a host computer. Rather, the terms computer external database ( s ) 660 may store information available in program and software are used herein in a generic sense to one or more publicly accessible databases such as TCGA reference any type of computer code ( e.g. , application ( The Cancer Genome Atlas ), one or more databases of software , firmware , microcode, or any other form of com clinical trial information , and / or one or more databases puter instruction ) that can be employed to program one or maintained by commercial sequencing suppliers. The exter more processors to implement aspects of the techniques nal database ( s ) 660 may store such information in any described herein . suitable way using any suitable hardware , as aspects of the [ 0382 ] Aspects of the technology described herein provide technology described herein are not limited in this respect. computer implemented methods for evaluating , generating, [ 0388 ] In some embodiments, the at least one database visualizing, and /or classifying biological characteristic ( s ) of 620 and the external database ( s) 660 may be the same sequence information of ( e.g. , cancer grade, tissue of origin ) database , may be part of the same database system , or may of subjects ( e.g. , cancer patients ) or those having, suspected be physically co - located , as aspects of the technology of having , or at risk of having a disorder ( e.g. , cancer) . described herein are not limited in this respect. [ 0383 ] In some embodiments, a software program may [ 0389 ] For example, in some embodiments , server ( s ) 640 provide a user with a visual representation of a subject ( e.g. , may access information stored in database ( s ) 620 and / or 660 patient) ' s characteristic ( s ) and / or other information related and use this information to perform processes described to a subject ( e.g. , patient ) ' s cancer using an interactive herein , described with reference to FIG . 10 , for determining graphical user interface (GUI ) . Such a software program one or more characteristics of a biological sample and / or of may execute in any suitable computing environment includ the sequence information . ing , but not limited to , a cloud - computing environment, a [ 0390 ] In some embodiments , server ( s ) 640 may include device co - located with a user ( e.g. , the user's laptop , desk one or multiple computing devices. When server ( s) 640 top , smartphone, etc. ) , one or more devices remote from the include multiple computing devices , the device ( s ) may be user ( e.g. , one or more servers ), etc. physically co - located ( e.g. , in a single room ) or distributed [ 0384 ] For example, in some embodiments, the techniques across multi -physical locations . In some embodiments, serv described herein may be implemented in the illustrative er ( s ) 640 may be part of a cloud computing infrastructure. In environment 600 shown in FIG . 10. As shown in FIG . 10 , some embodiments, one or more server ( s ) 640 may be within illustrative environment 600 , one or more biological co - located in a facility operated by an entity ( e.g. , a hospital , samples of a subject 680 may be provided to a laboratory research institution ) with which doctor 650 is affiliated . In 670. Laboratory 670 may process the biological sample ( s ) to such embodiments , it may be easier to allow server ( s) 640 obtain expression data ( e.g. , DNA , RNA , and / or protein to access private medical data for the patient 880 . expression data ) and /or sequence information and provide it , [ 0391 ] As shown in FIG . 10 , in some embodiments, the via network 610 , to at least one database 660 that stores results of the analysis performed by server ( s) 640 may be information about subject ( e.g. , patient) 680 . provided to doctor 650 through a computing device 630 [ 0385 ] Network 610 may be a wide area network ( e.g. , the (which may be a portable computing device , such as a laptop Internet ), a local area network ( e.g. , a corporate Intranet ), or smartphone, or a fixed computing device such as a US 2021/0005284 A1 Jan. 7 , 2021 40 desktop computer ). The results may be provided in a written [ 0396 ] Due to these properties, e.g. , MHCs are highly report, an e - mail , a graphical user interface, and / or any other polymorphic, co - dominance , and that there are a large suitable way. It should be appreciated that although in the number of alleles that may be present in a given species , the embodiment of FIG . 10 , the results are provided to a doctor MHC profile of a subject is highly specific and unique . Thus , 650 , in other embodiments , the results of the analysis may it is extremely unlikely that two people , except for identical be provided to patient 680 or a caretaker of patient 680 , a twins, will possess cells with the same set of MHC mol healthcare provider such as a nurse , or a person involved ecules . Accordingly, by evaluating the sequence of the MHC with a clinical trial. profile of sequence information , it can be used to corrobo [ 0392 ] In some embodiments, the results may be part of a rate , or disqualify, identifying information between the graphical user interface (GUI ) presented to the doctor 650 sequence information , an asserted information , other via the computing device 630. In some embodiments , the sequence information , or a combination thereof. GUI may be presented to the user as part of a webpage [ 0397 ] In some embodiments, one MHC allele is used for displayed by a web browser executing on the computing the evaluation . In some embodiments , at least two MHC device 630. In some embodiments, the GUI may be pre alleles are used for the evaluation . In some embodiments, at sented to the user using an application program ( different least three MHC alleles are used for the evaluation . In some from a web - browser ) executing on the computing device embodiments, at least four MHC alleles are used for the 630. For example, in some embodiments , the computing evaluation . In some embodiments , at least five MHC alleles device 630 may be a mobile device ( e.g. , a smartphone) and are used for the evaluation . In some embodiments, at least the GUI may be presented to the user via an application six MHC alleles are used for the evaluation . program ( e.g. , " an app ” ) executing on the mobile device. [ 0398 ] In some embodiments , the evaluated feature is a concordance value of single nucleotide polymorphisms [ 0393 ] The GUI presented on computing device 630 may ( SNPs ) . “ SNP ” or “ Single Nucleotide Polymorphism , ” as provide a wide range of oncological data relating to both the used herein , refers to a difference in a nucleic acid sequence patient and the patient's cancer in a new way that is compact ( e.g., genome, sequence data set) at a single nucleotide ( e.g. , and highly informative. Previously, oncological data was adenine ( A ) , thymine ( T ) , cytosine ( C ) , and / or guanine ( G ) ) obtained from multiple sources of data and at multiple times shared between subjects of a species or within an individual making the process of obtaining such information costly subject on paired . SNPs can be , or represent: from both a time and financial perspective. Using the changed nucleotides ( e.g. , A changed to T, G changed to A , techniques and graphical user interfaces illustrated herein , a etc. ) , known as a substitution ; removed nucleotides , wherein user can access the same amount of information at once with the nucleotide is absent from the sequence entirely, known less demand on the user and with less demand on the as a deletion ; or added nucleotides , wherein an additional computing resources needed to provide such information . nucleotide is added to the sequence . SNPs can lead to Low demand on the user serves to reduce clinician errors changes in an encoded protein ( e.g. , nonsynonymous SNPs ) , associated with searching various sources of information . or not ( e.g. , synonymous ). Further, when the SNP is non Low demand on the computing resources serves to reduce synonymous, it can cause a change in the encoded amino processor power, network bandwidth , and memory needed acid ( e.g. , missense ) or cause a premature stop codon ( e.g. , to provide a wide range of oncological data, which is an nonsense ). Synonymous SNPs can also alter the message of improvement in computing technology. In some embodi the nucleic acid sequence by influencing or changing the ments , the reports of the disclosure are presented to a user splice sites , transcription factor binding , and / or messenger by means of a system or by means of a GUI. RNA (mRNA ) binding . These mutations ( e.g. , changes to [ 0394 ] Accordingly, in an aspect , the disclosure relates to the protein encoding abilities of the sequence) can cause a a method of evaluating sequence information, to determine litany of effects including differences in phenotypes as well at least one feature . The evaluation can take place on a as various disease types. Moreover, SNPs occur in great computer or other automated machine capable of carrying numbers within a subject's genome , with some estimates out programmable instructions or can be performed manu being that a typical genome differs from the reference human ally by an evaluator. The features can be used to generate a genome at between 4 and 5 million sites , of which more than report for informing the evaluator of the at least one feature 99.9 % are SNPs . of the sequence information . In some embodiments, the [ 0399 ] Since SNPs are encoded in nucleic acids which are feature is the sequence of the MHC alleles of the sequence part of the genome, they are passed from parent to progeny information . (both subjects, and within a subject when nucleic acids [ 0395 ] The major histocompatibility complex ( MHC ) ( re replicate ) . Accordingly, because of this stable inheritance , ferred to as the Human Leukocyte Antigens ( HLAs ) in and because of the large number thereof, SNPs can be used ) is the mechanism by which the immune system is as a genetic marker of the relatedness of subjects, and also able to differentiate between self and nonself cells . It is a as a measure of the identity of two nucleic acid sequences as collection of glycoproteins ( proteins with a carbohydrate ) originating from the same subject. In some embodiments , that exist on the plasma membranes of nearly all body cells . the SNP concordance value is determined between the The ( MHC ) are highly polymorphic genes that are important sequence information and a reference sequence . In some in the immune system of biological organisms and originate embodiments , the SNP concordance value is determined from 20 genes , with more than 50 variations per gene between the sequence information and an asserted value . In between individuals, and allow for co - dominance between some embodiments , the SNP concordance value must be alleles . These glycoproteins are part of a pathway which equal to , or greater than a threshold value to be acceptable enables the immune system to identify self and non - self cells ( e.g. , deemed of sufficient quality and integrity ) for use in by aberrations in the MHC displayed on the plasma mem further analyses . In some embodiments , the threshold value brane . is 80 % . In some embodiments, a SNP concordance value is US 2021/0005284 A1 Jan. 7 , 2021 41 determined between a sequence data set and a subject, [ 0401 ] In some embodiments , the evaluated feature is a wherein if the SNP concordance value is at least 70 % ( e.g. , tumor type. at least 71 % , at least 72 % , at least 73 % , at least 74 % , at least [ 0402 ] In some embodiments , the evaluated feature is a 75 % , at least 76 % , at least 77 % , at least 78 % , at least 79 % , tissue type . at least 80 % , at least 81 % , at least 82 % , at least 83 % , at least [ 0403 ] In some embodiments , the evaluated feature is the 84 % , at least 85 % , at least 86 % , at least 87 % , at least 88 % , polyadenylation status of the sequence information . “ Poly at least 89 % , at least 90 % , at least 91 % , at least 92 % , at least adenylated ” or “ PolyA ” as used herein , refers to the series 93 % , at least 94 % , at least 95 % , at least 95.5 % , at least 96 % , of multiple adenosine monophosphate nucleotides attached at least 96.5 % , at least 97 % , at least 97.5 % , at least 98 % , at to the 3 ' - end of messenger RNA (mRNA ), this occurs after least 98.5 % , at least 99 % , at least 99.5 % , at least 99.6 % , at transcription and cleavage of the 3 ' - end to of the transcript to free a hydroxyl. The “ polyA tail, ” as it is often referred , least 99.7 % , at least 99.8 % , at least 99.9 % , at least 99.95 % , is a characteristic of fully processed mRNA and assists in at least 99.99 % , at least 99.999 % or more ), it is deemed to various cellular processes. For example , the polyA tail is the be sufficiently likely to be from the subject and identified as for a protein ( e.g. , polyA - binding protein ) being from the subject. As described herein , in some which promotes export from the nucleus of the cell so that embodiments , the determination of the concordance value is translation may occur, as well as effects translation and as described in the present disclosure . SNP concordance can stability of mRNA . If only transcripts of protein coding be performed by any means available or known in the art, for mRNA are present, it is highly likely ( e.g. , indicative ) that example SNP concordance is performed by a variety of the sample that generated the sequence information was online tools such as Conpair (github.com/nygenome/Con generated using mRNA -Seq . In some embodiments, the pair ) or GATK GenotypeConcordance ( software.broadinsti polyA status indicates mRNA - Seq was not used ( e.g. , the tute.org/gatk/documentation/tooldocs/3.8-0/org_broadinsti whole transcriptome was used ) . In some embodiments , the tute_gatk_tools_walkers_variantutils_ polyA status is evaluated against an asserted information . In GenotypeConcordance.php ) or may be calculated manually. some embodiments, the polyA status is evaluated against a In other examples, SNP concordance can be performed by reference sequence . In some embodiments, the probability tools as described at publicly available websites ( genome . of the sequence information being generated using either sph.umich.edu/wiki/VerifyBamID or software.broadinsti mRNA - Seq or the whole transcriptome must be above a tute.org/cancer/cga/contest ). threshold . In some embodiments , the threshold is a reference [ 0400 ] In some embodiments, the evaluated feature is a value . In some embodiments , the threshold level is 90 % . In quality score , for example a Phred score . A “ Phred Score ” some embodiments, the threshold value is an asserted infor ( also may be known or referred to herein as a “ Phred Quality mation . In some embodiments, the sequence information is Score ” ) as used herein , refers to a measure of the quality for from a sample which contained primarily polyadenylated the identification of nucleotides sequenced by nucleic acid nucleic acids . In some embodiments, the sequence informa sequencing systems or platforms ( e.g. , NGS ) . Phred Scores tion is from a sample which contained both polyadenylated are known in the art and are often generated from the and non - polyadenylated nucleic acids sequencing platform based upon several parameters ( e.g. , [ 0404 ] In some embodiments , the evaluated feature is the peak shape, resolution , etc.) and a score ( Q ) is assigned to GC content of the sequence information . “ G / C Content " or each nucleotide base call ( For a detailed review of the " guanine ( G ) -cytosine ( C ) content, ” as used herein , refers to calculation refer to Ewing B , Hillier L , Wendl MC , Green the percentage of nucleotides in a nucleic acid sample which P. Base - calling of automated sequencer traces using phred . are either G or C. It can be calculated by summing all of the 1. Accuracy assessment . Genome Res . 1998 March ; 8 ( 3 ) : G and C reads of given sequence information and dividing 175-85 . and Ewing B , Green P. Base - calling of automated by the total nucleotides sequenced . In some embodiments, sequencer traces using phred . II. Error probabilities. the sequence information is evaluated and the GC content is Genome Res . 1998 March ; 8 ( 3 ) : 186-94 . ) . The Phred Score calculated by summing the number of base calls resulting in of each base refers to the likelihood that a nucleotide base a G or C ( e.g. , G + C ) in the sequence information and call is incorrect ( base - calling error probability ( P ) ) and is dividing by the total number of base calls in the sequence determined by the equation Q = -10 log10P. Thus, the score information ( e.g. , number of nucleotides in a sequence data ( e.g. , b ) indicates the base call accuracy, for example a set ), or ( G + C )/ (Number of nucleotides in a sequence data Phred Score of 10 indicates a 90 % call accuracy for the base set ). in question , while a Phred Score of 40 indicates a 99.99 % [ 0405 ] The GC content may also be used as a quality call accuracy for the same base . In some embodiments , the measure of the sequence information . Many known Phred Score of the sequence information is determined and genomes have been sequenced , along with the respective compared to a reference value . In some embodiments , the exomes , transcriptomes, and various other portions thereof reference value is at least 27 , at least 28 , at least 29 , at least ( e.g. , measure of specific RNA components ). Moreover, 30 , or more than 30. In some embodiments, the Phred Score many of these sequences have been sequenced a great is determined and compared to the Phred Score of other number of times and averages and ranges for various com sequence information . In some embodiments , the Phred ponents thereof have been generated , for example, the GC Score is determined and compared to an asserted score . In content of the human genome. The GC content of the human some embodiments , the Phred Score is used as a base level genome is known to vary from approximately 35 % to 60 % , determination of quality . In some embodiments, it is used to and has a mean value ( e.g. , average value ) of about 41 % . compare a sequence with asserted information to compare Accordingly, as a quality measure if a sequence information identity , as if the Phred Scores are different it is unlikely they which is identified as a human genome were to be evaluated are the same sequence information or from the same sample to have a GC content of 75 % , the quality of the sequence or subject. information ( or the sample from which it was derived ) US 2021/0005284 A1 Jan. 7 , 2021 42 would be in question . As a result, the evaluated GC content [ 0409 ] In some embodiments, the evaluated feature is the can be compared to known ranges of GC contents expected coverage value . “ Coverage , " as used herein , refers to the for the sequence information , to a value provided by the number of unique reads of a given nucleotide in a recon sequence information donor, to a value from a database of structed sequence. As the nucleic acid is sequenced , it is not such values , to additional sequence information , or to a sequenced in one entire read ( e.g. , start to finish in one pass ) , reference range for sequence information of a given type to but rather is the result of multiple reads of portions or ascertain if they are consistent or if the GC content indicates segments of the nucleic acid ( e.g. , RNA , exome , genome ) of a problem , which may be due to degradation , residual which have an average length ( L ) , wherein the whole nucleic primers, contamination , or other complications with the acid when reconstructed has an overall length of ( G ) . As the sequence information . number of reads of a nucleic acid increase ( N ), the coverage will also increase . The coverage can be calculated as NxL / G . [ 0406 ] Accordingly, in some embodiments , the GC con In some embodiments , the coverage value is compared to an tent feature is used in a method to evaluate the integrity of asserted information . In some embodiments , the coverage the sequence information by determining a GC content for value is compared to a threshold or reference value . In some each sequence data set being evaluated ; wherein , if the GC embodiments, the threshold or reference value is a statisti content is either: ( i ) less than 30 % ; or greater than 55 % , the cally significant value . In some embodiments, the coverage nucleic acid samples are deemed likely of insufficient qual value is compared to other sequence information . In some ity, and are removed , discarded , retested , or reported as of embodiments, the target value of the coverage for tumor is insufficient quality , and wherein , if the GC content is : at least more than 150x ( e.g. , 170x , 190x ) . In some embodiments, 30 % , and less than or equal to 55 % , the nucleic acid samples the target value of the coverage for normal tissue is more are deemed of sufficient quality and are retained . Addition than 100x ( e.g. , 110x , 120x , 130x ) . Publicly available tools ally, in some embodiments, the GC content may be calcu can be used for determining the coverage value ( github.com/ lated and compared to asserted information . The GC con brentp /mosdepth and biodatageeks.org/sequila/ ). tent, in some embodiments, can be used to match the asserted information and thus corroborate or question the [ 0410 ] The features and evaluations described herein can identity of the sequence information as being the same be evaluated and used individually as well as in conjunction sequence information asserted , being from a given sample , with one another. In some embodiments , at least one feature subject, or specific sample or subject. is evaluated ( e.g. , at least one , at least two, at least three , at least four, or more ). In some embodiments, at least two [0407 ] In some embodiments, the evaluated feature is a features are evaluated ( e.g. , at least three, at least four, or ratio of protein subunit expression ( e.g. , an expression ratio more ). In some embodiments, at least four features are of nucleic acids encoding different subunits of a protein ) . evaluated ( e.g. , at least four, or more ). In some embodi Protein expression can be measured by any means known in ments , at least five features are evaluated ( e.g. , at least five , the art, for example expression can be determined ( e.g. , or more ). In some embodiments, at least six features are quantified ) by counting the number of reads that mapped to evaluated ( e.g. , at least six , or more ). In some embodiments , each locus in the transcriptome. By evaluating the expres at least seven features are evaluated ( e.g. , at least seven , or sion of protein subunits ( e.g. , proteins which have multiple more ). In some embodiments, at least eight features are subunits expressed by different coding regions ) , and then evaluated ( e.g. , at least eight, or more ). In some embodi calculating a ratio thereof, it is possible to compare the ratio ments , at least nine features are evaluated ( e.g. , at least nine , with a known value , reference value , threshold value , other or more ). In some embodiments , at least ten features are sequence information , or an asserted information . In some evaluated ( e.g. , at least ten , or more ) . In some embodiments, embodiments, the evaluated protein subunits are from pro at least eleven features are evaluated ( e.g. , at least eleven , or teins that are present in a human sample, regardless of the more ). In some embodiments , at least twelve features are presence or absence of a certain type of cancer. Without evaluated ( e.g. , at least twelve , or more ). In some embodi wishing to be bound by any theory, such protein is encoded ments , at least thirteen features are evaluated ( e.g. , at least by a housekeeping gene ( e.g. , a positive or negative control thirteen , or more ) . In some embodiments, at least fourteen of a human sample ). In some embodiments, the known features are evaluated ( e.g. , at least fourteen , or more ) . In value , reference value , or threshold value is a fixed ratio . For some embodiments , at least fifteen features are evaluated example , subunit A and subunit B of a known protein has a ( e.g. , at least fifteen , or more ). ratio of 1 : 1 or 2 : 1 . In some embodiments , the evaluated [ 0411 ] The features described herein can be evaluated protein subunits are from proteins that are present in a sequentially, simultaneously, parallel, or a combination human sample having or suspected of having a certain type thereof. As it can be envisioned , the evaluations required to of cancer . evaluate some features, may be useful in evaluating addi [ 0408 ] In some embodiments, the ratio is compared to a tional or other features. Accordingly, where such informa known value . In some embodiments, it is compared to an tion or evaluation results are useful for other determinations, asserted information. In some embodiments, it is compared it is possible to perform evaluations of multiple features at to other sequence information . In some embodiments , the once ( e.g. , simultaneous ), or to use the information for a protein , and subunits thereof are selected for analysis due to follow -on evaluations ( e.g. , sequential ). Additionally, it can their properties. For example , they may degrade quickly and be envisioned to run multiple evaluations at the same time therefore serve as a proxy for the stability and / or quality of of different features ( e.g. , parallel) . In some embodiments, the sample from which the sequence information was gen features are evaluated sequentially. In some embodiments, erated . In some embodiments, they may be selected for their features are evaluated simultaneously. In some embodi variability between subjects, or samples, thereby allowing ments, features are evaluated in parallel. In some embodi for comparison to corroborate or disqualify the identity of ments , features are evaluated in a combination of methods the sequence information . ( e.g. , simultaneously as well as sequentially ). US 2021/0005284 A1 Jan. 7 , 2021 43

Identifying a Cancer Treatment contents of which are incorporated herein by reference . A " gene group ” may be referred to herein as a " module ” . [ 0412 ] A subject's sequencing data obtained using any one [ 0416 ] Exemplary modules may include, but are not lim of the methods described herein may be used for various ited to , Major histocompatibility complex I ( MHC I ) mod clinical purposes including, but not limited to , monitoring ule , Major histocompatibility complex II ( MHC II ) module , the progress of cancer in a subject, assessing the efficacy of Coactivation molecules module , Effector cells module, a treatment for cancer , identifying subjects suitable for a Effector T cell module ; Natural killer cells ( NK cells ) particular treatment, evaluating suitability of a patient for module , T cell traffic module , T cells module , B cells participating in a clinical trial and / or predicting relapse in a module, B cell traffic module , Benign B cells module , subject. Accordingly, described herein are diagnostic and Malignant B cell marker module , M1 signatures module , prognostic methods for cancer treatment based on sequenc Th1 signature module , Antitumor cytokines module , Check ing data obtained using methods described herein . In some point inhibition ( or checkpoint molecules ) module , Follicu embodiments, a method to process RNA expression data as lar dendritic cells module, Follicular B helper T cells mod described herein comprises identifying a cancer treatment ule , Protumor cytokines module, Regulatory T cells ( Treg ) ( also referred to herein as an anti - cancer therapy ) for the module , Treg traffic module, Myeloid -derived suppressor subject using the bias - corrected gene expression data . cells ( MDSCs ) module, MDSC and TAM traffic module , Granulocytes module , Granulocytes traffic module , Eosino Molecular Functional Expression Signatures phil signature model, Neutrophil signature model, Mast cell [ 0413 ] In some embodiments, identifying a cancer treat signature module, M2 signature module , Th2 signature ment for a subject comprises characterizing the cancer or module , Th17 signature module, Protumor cytokines mod tumor in the subject using bias - corrected gene expression ule , Complement inhibition module, Fibroblastic reticular data . In some embodiments, a cancer in a subject is char cells module , Cancer associated fibroblasts ( CAFs ) module , acterized by determining a molecular functional expression Matrix formation ( or Matrix ) module, Angiogenesis module , signature, which may include and / or reflect information Endothelium module , Hypoxia factors module , Coagulation relating to the molecular characteristics of a tumor including module, Blood endothelium module , Lymphatic endothe tumor genetics , pro - tumor microenvironment factors, and lium module, Proliferation rate ( or Tumor proliferation rate ) anti - tumor immune response factors. module, Oncogenes module, PI3K /AKT / mTOR signaling [ 0414 ] A " molecular functional expression signature module, RAS /RAF /MEK signaling module, Receptor tyro (MFES ) ” , as described herein , refers to information relating sine kinases expression module, Growth Factors module , to molecular and cellular composition, and biological pro Tumor suppressors module, Metastasis signature module , cesses that are present within and / or surrounding the tumor. Antimetastatic factors module , and Mutation status module . In some embodiments , the MFES of a patient includes gene [ 0417 ] In some embodiments , each of one or more gene express levels for each of one or more groups of genes groups in an MFES may comprise at least two genes ( e.g. , ( “ gene groups ” ) . In some embodiments, the information in at least two genes , at least three genes, at least four genes , the MFES may be generated using gene expression data at least five genes , at least six genes, at least seven genes, at ( e.g. , bias corrected gene expression data ) for the gene least eight genes , at least nine genes , at least ten genes, or groups obtained by sequencing normal and / or tumor tissue . more than ten genes as shown in the following lists ; in some Though other types of gene expression data may be used to embodiments all of the listed genes are selected from each generate an MFES , it should be appreciated that the inven group ; and in some embodiments the numbers of genes in tors recognized that using bias - corrected gene expression each selected group are not the same. data to generate a molecular functional expression signature [ 0418 ] In some embodiments, the modules in a molecular allows the resulting MFES to more accurately and faithfully functional expression signature may comprise or consist of: represent the molecular functional characteristics of the Major histocompatibility complex I ( MHC I ) module, Major subject's tumor. In turn , applying an MFES determined from histocompatibility complex II ( MHC II ) module , Coactiva bias - corrected gene expression data to identifying a cancer tion molecules module , Effector cells ( or Effector T cell ) therapy for the subject allows for the identification of more module , Natural killer cells ( NK cells ) module, T cells effective therapies, improved ability to determine whether module, B cells module , M1 signatures module , Th1 signa one or more cancer therapies will be effective if adminis ture module , Antitumor cytokines module , Checkpoint inhi tered to the subject, improved ability to identify clinical bition ( or checkpoint molecules ) module, Regulatory T cells trials in which the subject may participate , and / or improve ( Treg) module , Myeloid -derived suppressor cells ( MDSCs ) ments to numerous other prognostic , diagnostic , and clinical module, Neutrophil signature model , M2 signature module , applications. Th2 signature module, Protumor cytokines module, Complement inhibition module , Cancer associated fibro Gene Groups blasts ( CAFs ) module , Angiogenesis module , Endothelium module, Proliferation rate ( or Tumor proliferation rate ) [ 0415 ] A “ gene group ” refers to a group of genes associ module, PI3K / AKT/mTOR signaling module , RAS /RAF / ated with molecular processes present within and / or sur MEK signaling module, Receptor tyrosine kinases expres rounding a tumor. Examples of gene groups and techniques sion module , Growth Factors module , Tumor suppressors for determining gene group expression levels are described module, Metastasis signature module , and Antimetastatic in International PCT Publication WO2018 / 231771 , pub factors module . The modules may additionally include : T lished on Dec. 20 , 2018 , entitled “ Systems and Methods for cell traffic module, Antitumor cytokines module, Treg traffic Generating, Visualizing and Classifying Molecular Func module , MDSC and TAM traffic module, Granulocytes or tional Profiles, ” ( being a publication of PCT Application Granulocyte traffic module , Eosinophil signature model, No .: PCT /US20 /037017 , filed Jun . 12 , 2018 ) , the entire Mast cell signature module , Th17 signature module , Matrix US 2021/0005284 A1 Jan. 7 , 2021 44 formation ( or Matrix ) module , and Hypoxia factors module . B cell marker module : MME , CD70 , CD20 , CD22 , and Such an MFES may be determined for a subject having a PAX5 ; Mi signatures module : NOS2 , IL12A , IL12B , solid cancer ( e.g. , a melanoma) and used , for example, to IL23A , TNF, IL1B , and SOCS3 ; Thl signature module : identify a therapy for treating the solid cancer. IFNG , IL2 , CD40LG , IL15 , CD27 , TBX21 , LTA , and IL21 ; [ 0419 ] In some embodiments, the modules in a molecular Antitumor cytokines module : HMGB1 , TNF, IFNB1 , functional expression signature may comprise or consist of: IFNA2 , CCL3 , TNFSF10 , and FASLG ; Checkpoint inhibi Effector cells ( or Effector T cell ) module , Natural killer cells tion ( or checkpoint molecules ) module: PDCD1 , CD274 , ( NK cells ) module , T cells module, Malignant B cell marker CTLA4 , LAGS , PDCDILG2 , BTLA , HAVCR2 , and VSIR ; module, M1 signatures module , Th1 signature module , Follicular dendritic cells module : CR1 , FCGR2A , FCGR2B , Checkpoint inhibition ( or checkpoint molecules) module , FCGR2C , CR2 , FCER2 , CXCL13 , MADCAMI, ICAMI , Follicular dendritic cells module, Follicular B helper T cells VCAMI , BST1 , LTBR , and TNFRSF1A ; Follicular B module, Protumor cytokines module, Regulatory T cells helper T cells module : CXCR5 , B3GAT1 , ICOS , CD40LG , ( Treg ) module, Neutrophil signature model, M2 signature CD84 , IL21 , BCL6 , MAF , and SAP ; Protumor cytokines module, Th2 signature module, Complement inhibition module : IL10 , TGFB1 , TGFB2 , TGFB3 , IL22 , MIF , module, Fibroblastic reticular cells module , Angiogenesis TNFSF13B , IL6 , and IL7 ; Regulatory T cells ( Treg ) mod module, Blood endothelium module, Proliferation rate ( or ule : TGFB1 , TGFB2 , TGFB3 , FOXP3 , CTLA4 , IL10 , Tumor proliferation rate ) module, Oncogenes module, and TNFRSF18 , TNFR2 , and TNFRSF1B ; Treg traffic module : Tumor suppressors module . The modules may additionally CCL17 , CXCL12 , CXCR4 , CCR4 , CCL22 , CCL1 , CCL2 , include : Major histocompatibility complex I ( MHC I ) mod CCL5 , CXCL13 , and CCL28 ; Myeloid -derived suppressor ule , Major histocompatibility complex II ( MHC II ) module, cells ( MDSCs ) module : IDO1 , ARG1 , IL4R , IL10 , TGFB1 , Coactivation molecules module, B cell traffic module , TGFB2 , TGFB3 , NOS2 , CYBB , CXCR4 , and CD33 ; Benign B cells module, Antitumor cytokines module, Treg MDSC and TAM traffic module : CXCL1 , CXCL5 , CCL2 , traffic module, Mast cell signature module, Th17 signature CCL4 , CCLS , CCR2 , CCL3 , CCL5 , CSF1 , and CXCL8 ; module , Matrix formation ( or Matrix ) module, Hypoxia Granulocytes module : CXCLS , CXCL2 , CXCL1 , CCL11 , factors module, Coagulation module , and Lymphatic CCL24 , KITLG , CCL5 , CXCL5 , CCR3 , CCL26 , PRG2 , endothelium module. Such an MFES may be determined for EPX , RNASE2, RNASE3, IL5RA , GATAL, SIGLEC8 , a subject having follicular lymphoma and used , for example , PRG3 , MPO , ELANE , PRTN3, CTSG , FCGR3B , CXCR1 , to identify a therapy for treating the follicular lymphoma. CXCR2 , CD177 , P13 , FFAR2, PGLYRP1, CMA1 , TPSAB1 , [ 0420 ] In some embodiments, the gene groups in an MS4A2, CPA3, IL4 , IL5 , IL13 , and SIGLEC8; Granulocyte MFES may comprise at least two genes ( e.g. , at least two traffic module: CXCLS , CXCL2 , CXCL1 , CCL11 , CCL24 , genes, at least three genes, at least four genes, at least five KITLG , CCL5 , CXCL5 , CCR3 , and CCL26 ; Eosinophil genes , at least six genes, at least seven genes , at least eight signature model : PRG2 , EPX , RNASE2 , RNASE3, IL5RA , genes, at least nine genes , at least ten genes, or more than ten GATA1, SIGLEC8 , and PRG3 ; Neutrophil signature model : genes as shown in the following lists ; in some embodiments MPO , ELANE, PRTN3 , CTSG , FCGR3B , CXCR1 , all of the listed genes are selected from each group ; and in CXCR2 , CD177 , P13 , FFAR2 , and PGLYRP1; Mast cell some embodiments the numbers of genes in each selected signature module : CMA1, TPSAB1 , MS4A2 , CPA3, IL4 , group are not the same ): Major histocompatibility complex IL5 , IL13 , and SIGLEC8 ; M2 signature module : IL10 , I ( MHC I ) module : HLA - A , HLA - B , HLA - C , B2M , TAP1, VEGFA , TGFB1 , IDO1 , PTGES , MRC1 , CSF1 , LRP1 , and TAP2; Major histocompatibility complex II ( MHC II ) ARG1 , PTGS1 , MSR1 , CD163 , and CSF1R ; Th2 signature module : HLA -DRA , HLA - DRB1 , HLA - DOB , HLA - DPB2 , module : IL4 , IL5 , IL13 , IL10 , IL25 , and GATA3 ; Th17 HLA - DMA , HLA - DOA , HLA - DPA1, HLA - DPB1 , HLA signature module: IL17A , IL22 , IL26 , IL17F , IL21 , and DMB , HLA - DQB1, HLA - DQA1, HLA - DRB5 , HLA RORC ; Protumor cytokines module : IL10 , TGFB1 , TGFB2 , DQA2 , HLA - DQB2 , and HLA - DRB6 ; Coactivation mol TGFB3 , IL22 , and MIF ; Complement inhibition module : ecules module : CD80 , CD86 , CD40 , CD83 , TNFRSF4 , CFD , CFI , CD55 , CD46 , CR1 , and CD59 ; Fibroblastic ICOSLG , CD28 ; Effector cells module : IFNG , GZMA , reticular cells module : DES , VIM , PDGFRA , PDPN , NT5E , GZMB , PRF1 , LCK , GZMK , ZAP70 , GNLY, FASLG , THY1 , ENG , ACTA2, LTBR , TNFRSF1A , VCAMI , TBX21 , EOMES , CD8A , and CD8B ; Effector T cell mod ICAM1 , and BST1 ; Cancer associated fibroblasts ( CAFs ) ule : IFNG , GZMA , GZMB , PRF1 , LCK , GZMK , ZAP70 , module: COL1A1 , COL1A2 , COL4A1 , COL5A1 , TGFB1 , GNLY, FASLG , TBX21 , EOMES , CD8A , and CD8B ; Natu TGFB2 , TGFB3 , ACTA2, FGF2 , FAP, LRP1 , CD248 , ral killer cells ( NK cells ) module : NKG7 , CD160 , CD244 , COL6A1 , COL6A2 , COL6A3 , FBLN1 , LUM , MFAP5 , NCR1 , KLRC2 , KLRK1 , CD226 , GZMH , GNLY, IFNG , LGALS1, and PRELP ; Matrix formation ( or Matrix ) mod KIR2DL4 , KIR2DS1 , KIR2DS2 , KIR2DS3 , KIR2DS4 , ule : MMP9 , FN1 , COL1A1 , COL1A2 , COL3A1 , COL4A1 , KIR2DS5 , EOMES , CLIC3 , FGFBP2 , KLRF1 , and CAI , VTN , LGALS7, TIMP1 , MMP2 , MMP1 , MMP3 , SH2D1B ; T cell traffic module : CXCL9 , CXCL10 , CXCR3 , MMP12 , LGALS9 , MMP7 , and COL5A1 ; Angiogenesis CX3CL1 , CCR7 , CXCL11 , CCL21 , CCL2 , CCL3 , CCL4 , module : VEGFA , VEGFB , VEGFC , PDGFC , CXCL8 , and CCL5 ; T cells module : EOMES , TBX21 , ITK , CD3D , CXCR2 , FLT1, PIGF , CXCL5 , KDR , ANGPT1, ANGPT2 , CD3E , CD3G , TRAC , TRBC1 , TRBC2 , LCK , UBASH3A , TEK , VWF , CDH5 , NOS3 , VCAM1 , MMRN1, LDHA , TRAT1, CD5 , and CD28 ; B cells module : CD19 , MS4A1 , HIF1A , EPAS1, CAI , SPP1 , LOX , SLC2A1 , and LAMP3 ; TNFRSF13C , CD27 , CD24 , CR2 , TNFRSF17 , Endothelium module: VEGFA , NOS3 , KDR , FLT1, TNFRSF13B , CD22 , CD79A , CD79B , BLK , FCRL5 , VCAM1 , VWF , CDH5 , MMRN1, CLEC14A , MMRN2, and PAX5, and STAP1; B cell traffic module : CXCL13 and ECSCR ; Hypoxia factors module : LDHA , HIF1A , EPASI , CXCR5 ; Benign B cells module : CD19 , MS4A1 , CAI , SPP1 , LOX , SLC2A1 , and LAMP3; Coagulation TNFRSF13C , CD27 , CD24 , CR2 , TNFRSF17 , module : HPSE , SERPINE1 , SERPINB2 , F3 , and ANXA2; TNFRSF13B , CD22 , CD79A , CD79B , and BLK ; Malignant Blood endothelium module : VEGFA , NOS3 , KDR , FLT1 , US 2021/0005284 A1 Jan. 7 , 2021 45

VCAMI , VWF, CDH5 , and MMRN1; Lymphatic endothe CD40LG , IL15 , CD27 , TBX21 , LTA , and IL21 ; Checkpoint lium module: CCL21 and CXCL12 ; Proliferation rate ( or inhibition ( or checkpoint molecules ) module : PDCD1 , Tumor proliferation rate) module: MKI67 , ESCO2 , CETN3 , CD274 , CTLA4 , LAG3 , PDCD1LG2 , BTLA , HAVCR2, CDK2 , CCND1, CCNEI , AURKA , AURKB , E2F1 , and VSIR ; Regulatory T cells ( Treg ) module : TGFB1 , MYBL2 , BUB1 , PLK1 , PRC1 , CCNB1 , MCM2 , MCM6 , TGFB2 , TGFB3 , FOXP3 , CTLA4 , IL10 , and TNFRSF1B ; CDK4 , and CDK6 ; Oncogenes module : MDM2 , MYC , Myeloid - derived suppressor cells ( MDSCs) module : IDOI , AKT1 , BCL2 , MME , and SYK ; PI3K / AKT /mTOR signal ARG1 , IL4R , IL10 , TGFB1 , TGFB2 , TGFB3 , NOS2 , ing module : PIK3CA , PIK3CB , PIK3CG , PIK3CD , AKTI , CYBB , CXCR4 , and CD33 ; Neutrophil signature model : MTOR , PTEN , PRKCA , AKT2 , and AKT3 ; RAS /RAF / MPO , ELANE , PRTN3, CTSG , FCGR3B , CXCR1 , MEK signaling module : BRAF, FNTA , FNTB , MAP2K1 , CXCR2 , CD177 , P13 , FFAR2, and PGLYRP1, M2 signature MAP2K2, MKNK1, and MKNK2; Receptor tyrosine module : IL10 , VEGFA , TGFB1 , IDO1 , PTGES , MRC1 , kinases expression module : ALK , AXL , KIT, EGFR , CSF1 , LRP1 , ARG1 , PTGS1 , MSR1 , CD163 , and CSF1R ; ERBB2 , FLT3 , MET, NTRK1 , FGFR1 , FGFR2 , FGFR3 , Th2 signature module : IL4 , IL5 , IL13 , IL10 , IL25 , and ERBB4 , ERBB3 , BCR - ABL , PDGFRA , PDGFRB , and GATA3; Protumor cytokines module : IL10 , TGFB1 , ABL1 ; Growth Factors module : NGF , CSF3 , CSF2 , FGF7 , TGFB2 , TGFB3 , IL22 , and MIF ; Complement inhibition IGF1 , IGF2 , IL7 , and FGF2 ; Tumor suppressors module : module : CFD , CFI , CD55 , CD46 , and CR1 ; Cancer asso TP53 , MLL2 , CREBBP, EP300 , ARIDIA , HIST1H1 , ciated fibroblasts ( CAFs ) module : COL1A1 , COL1A2 , EBF1 , IRF4 , IKZF3 , KLHL6 , PRDM1, CDKN2A , RB1 , COL4A1 , COL5A1 , TGFB1 , TGFB2 , TGFB3 , ACTA2, EPHA7 , TNFAIP3, TNFRSF14 , FAS , SHP1 , SOCS1 , SIK1 , FGF2 , FAP, LRP1 , CD248 , COL6A1 , COL6A2 , COL6A3 , PTEN , DCN , MTAP, AIM2 , and MITF ; Metastasis signature FBLN1 , LUM , MFAP5 , and PRELP ; Angiogenesis module : module : ESRP1 , HOXA1 , SMARCA4 , TWIST1 , NEDD9 , VEGFA , VEGFB , VEGFC , PDGFC , CXCLS , CXCR2 , PAPPA , CTSL , SNAI2 , and HPSE ; Antimetastatic factors FLT1, PIGF , CXCL5 , KDR , ANGPT1, ANGPT2, TEK , module: NCAM1 , CDHI , KISS1 , BRMS1 , ADGRG1 , VWF , CDH5 , NOS3 , VCAM1 , and MMRN1; Endothelium TCF21 , PCDH10 , and MITF ; and Mutation status module : module : VEGFA , NOS3 , KDR , FLT1, VCAMI , VWF , APC , ARIDIA , ATM , ATRX , BAP1 , BRAF , BRCA2 , CDH5 , MMRN1, CLEC14A , MMRN2 , and ECSCR ; Pro CDH1 , CDKN2A , CTCF , CTNNB1, DNMT3A , EGFR , liferation rate ( or Tumor proliferation rate ) module : MK167 , FBXW7 , FLT3 , GATA3, HRAS , IDH1 , KRAS , MAP3K1 , ESCO2 , CETN3 , CDK2 , CCND1 , CCNE1, AURKA , MTOR , NAV3, NCOR1 , NF1 , NOTCH1 , NPM1 , NRAS , AURKB , E2F1 , MYBL2, BUB1 , PLK1 , CCNB1 , MCM2 , PBRMI , PIK3CA , PIK3R1 , PTEN , RB1 , RUNX1, SETD2 , MCM6 , CDK4 , and CDK6 ; PI3K / AKT /mTOR signaling STAG2, TAF1, TP53 , and VHL . In certain embodiments, module: PIK3CA , PIK3CB , CG , PIK CD , AKT1, two or more genes from any combination of the listed MTOR , PTEN , PRKCA , AKT2 , and AKT3 ; RAS / RAF / modules may be used to generate a molecular functional MEK signaling module : BRAF, FNTA , FNTB , MAP2K1, expression signature ( or a visualization thereof, termed an MAP2K2, MKNK1, and MKNK2; Receptor tyrosine “ MF PORTRAIT ” herein ) for a subject. kinases expression module : ALK , AXL , KIT , EGFR , [ 0421 ] In some embodiments, the gene groups in an ERBB2 , FLT3 , MET, NTRK1 , FGFR1 , FGFR2 , FGFR3 , MFES may comprise at least two genes ( e.g. , at least two ERBB4 , ERBB3 , BCR - ABL , PDGFRA , PDGFRB , and genes, at least three genes, at least four genes , at least five ABL1 ; Growth Factors module: NGF , CSF3 , CSF2 , FGF7 , genes, at least six genes, at least seven genes , at least eight IGF1 , IGF2 , IL7 , and FGF2 ; Tumor suppressors module : genes , at least nine genes, at least ten genes, or more than ten TP53 , SIK1 , PTEN , DCN , MTAP, AIM2 , RB1 , and MITF ; genes as shown in the following lists ; in some embodiments Metastasis signature module : ESRP1 , HOXA1 , SMARCA4, all of the listed genes are selected from each group ; and in TWIST1 , NEDD9 , PAPPA , and HPSE ; and Antimetastatic some embodiments the numbers of genes in each selected factors module : NCAM1 , CDHI , KISS1 , and BRMS1 . In group are not the same ) : Major histocompatibility complex some embodiments, the gene groups may further comprise I ( MHC I ) module : HLA - A , HLA - B , HLA - C , B2M , TAP1, at least two genes ( e.g. , least two genes , at least three and TAP2 ; Major histocompatibility complex II ( MHC II ) genes, at least four genes , at least five genes, at least six module ; HLA - DRA , HLA - DRB1 , HLA - DOB , HLA - DPB2 , genes, at least seven genes , at least eight genes, at least nine HLA - DMA , HLA - DOA , HLA - DPA1, HLA - DPB1 , HLA genes , at least ten genes , or more than ten genes as shown DMB , HLA - DQB1 , HLA -DQA1 , HLA - DRB5 , HLA in the following lists ; in some embodiments all of the listed DQA2 , HLA - DQB2 , and HLA - DRB6 ; Coactivation mol genes are selected from each group ; and in some embodi ecules module: CD80 , CD86 , CD40 , CD83 , TNFRSF4 , ments the numbers of genes in each selected group are not ICOSLG , CD28 ; Effector cells ( or Effector T cell ) module : the same ) : T cell traffic module : CXCL9 , CXCL10 , CXCR3 , IFNG , GZMA , GZMB , PRF1 , LCK , GZMK , ZAP70 , CX3CL1 , CCR7 , CXCL11 , CCL21 , CCL2 , CCL3 , CCL4 , GNLY, FASLG , TBX21 , EOMES , CD8A , and CD8B ; Natu and CCL5 ; Antitumor cytokines module : HMGB1 , TNF , ral killer cells ( NK cells ) module : NKG7 , CD160 , CD244 , IFNB1 , IFNA2, CCL3 , TNFSF10 , and FASLG ; Treg traffic NCR1 , KLRC2 , KLRK1 , CD226 , GNLY, KIR2DL4 , module: CCL17 , CXCL12 , CXCR4 , CCR4 , CCL22 , CCL1 , KIR2DS1 , KIR2DS2 , KIR2DS3 , KIR2DS4 , KIR2DS5 , CCL2 , CCL5 , CXCL13 , and CCL28 ; MDSC and TAM EOMES , CLIC3 , FGFBP2 , KLRF1 , and SH2D1B ; T cells traffic module : CXCL1 , CXCL5 , CCL2 , CCL4 , CCL8 , module : TBX21 , ITK , CD3D , CD3E , CD3G , TRAC , CCR2 , CCL3 , CCL5 , CSF1 , and CXCL8 ; Granulocyte TRBCI , TRBC2 , LCK , UBASHBA , TRAT1, CD5 , and traffic module : CXCL8 , CXCL2 , CXCL1 , CCL11 , CCL24 , CD28 ; B cells module: CD19 , MS4A1 , TNFRSF13C , KITLG , CCL5 , CXCL5 , CCR3 , and CCL26 ; Eosinophil CD27 , CD24 , CR2 , TNFRSF17 , TNFRSF13B , CD22 , signature model : PRG2 , EPX , RNASE2, RNASE3 , IL5RA , CD79A , CD79B , BLK , FCRL5 , PAX5 , and STAP1; M1 GATA1, SIGLEC8 , and PRG3; Mast cell signature module : signatures module : NOS2 , IL12A , IL12B , IL23A , TNF , CMA1, TPSAB1 , MS4A2 , CPA3, IL4 , IL5 , IL13 , and IL1B , and SOCS3 ; Th1 signature module : IFNG , IL2 , SIGLEC8 ; Th17 signature module : IL17A , IL22 , ?L26 , US 2021/0005284 A1 Jan. 7 , 2021 46

IL17F , IL21 , and RORC ; Matrix formation ( or Matrix ) comprise at least two genes ( e.g. , at least two genes , at least module : FN1 , CA9 , MMP1 , MMP3 , MMP12 , LGALSO , three genes, at least four genes, at least five genes , at least MMP7 , MMP9 , COL1A1 , COL1A2 , COL4A1 , and six genes , at least seven genes, at least eight genes , at least COL5Al ; and Hypoxia factors module : LDHA , HIF1A , nine genes , at least ten genes, or more than ten genes as EPAS1, CAI , SPP1 , LOX , SLC2A1 , and LAMPS . In certain shown in the following lists ; in some embodiments all of the embodiments , two or more genes from each of the listed listed genes are selected from each group ; and in some modules are included . Any of the foregoing sets of modules embodiments the numbers of genes in each selected group may be used to generate an MFES ( or a visualization are not the same ): Coactivation molecules module : thereof) for a subject with a solid cancer ( e.g. , melanoma ). TNFRSF4 and CD28 ; B cell traffic module: CXCL13 and [ 0422 ] In some embodiments, the gene groups may com CXCR5 ; Antitumor cytokines module : HMGB1 , TNF , prise at least two genes ( e.g. , at least two genes , at least three IFNB1 , IFNA2, CCL3 , TNFSF10 , FASLG ; Treg traffic genes, at least four genes , at least five genes, at least six module : CCL17 , CCR4 , CCL22 , and CXCL13 ; Eosinophil genes, at least seven genes, at least eight genes , at least nine signature model : PRG2 , EPX , RNASE2, RNASE3, IL5RA , genes , at least ten genes, or more than ten genes as shown GATA1, SIGLEC8 , and PRG3; Mast cell signature module : in the following lists ; in some embodiments all of the listed CMA1, TPSAB1 , MS4A2 , CPA3, IL4 , IL5 , IL13 , and genes are selected from each group ; and in some embodi SIGLEC8 ; Th17 signature module : IL17A , IL22 , IL26 , ments the numbers of genes in each selected group are not IL17F, IL21 , and RORC ; Matrix formation ( or Matrix ) the same ) : Effector T cell module : IFNG , GZMA, GZMB , module : MMP9 , FN1 , COL1A1 , COL1A2 , COL3A1 , PRF1 , LCK , GZMK , ZAP70 , GNLY, FASLG , TBX21 , COL4A1 , CAI , VTN , LGALS7 , TIMP1 , and MMP2 ; Hyp EOMES , CD8A , and CD8B ; Natural killer cells ( NK cells ) oxia factors module : LDHA , HIF1A , EPAS1, CAI , SPP1 , module : NKG7 , CD160 , CD244 , NCR1 , KLRC2 , KLRK1 , LOX , SLC2A1 , and LAMP3; Coagulation module : HPSE , CD226 , GZMH , GNLY, IFNG , KIR2DL4 , KIR2DS1 , SERPINET , SERPINB2 , F3 , and ANXA2 ; and Lymphatic KIR2DS2 , KIR2DS3 , KIR2DS4 , and KIR2DS5 ; T cells endothelium module : CCL21 and CXCL12 . In certain module : EOMES , TBX21 , ITK , CD3D , CD3E , CD3G , embodiments, two or more genes from each of the listed TRAC , TRBC1, TRBC2 , LCK , UBASH3A , and TRAT1 ; modules are included . Any of the foregoing sets of modules Benign B cells module : CD19 , MS4A1 , TNFRSF13C , may be used to generate an MFES ( or a visualization CD27 , CD24 , CR2 , TNFRSF17 , TNFRSF13B , CD22 , thereof) for a subject with a follicular lymphoma. CD79A , CD79B , and BLK ; Malignant B cell marker mod ule : MME , CD70 , CD20 , CD22 , and PAX5; M1 signatures [ 0423 ] In some embodiments , an MFES may include one module : NOS2 , IL12A , IL12B , IL23A , TNE 1B , and or more gene groups associated with cancer malignancy and SOCS3 ; Th1 signature module : IFNG , IL2 , CD40LG , IL15 , one or more gene groups associated with the cancer CD27 , TBX21 , LTA , and IL21 ; Checkpoint inhibition ( or microenvironment. In some embodiments, the gene group ( s ) checkpoint molecules ) module : PDCDI , CD274 , CTLA4 , associated with cancer malignancy include the tumor prop LAGS , PDCDILG2 , BTLA , and HAVCR2 ; Follicular den erties gene group . In some embodiments, the gene group ( s ) dritic cells module: CR1 , FCGR2A , FCGR2B , FCGR2C , associated with cancer microenvironment include the tumor CR2 , FCER2 , CXCL13 , MADCAM1, ICAMI , VCAM1 , promoting immune microenvironment gene group , the anti BST1 , LTBR , and TNFRSF1A ; Follicular B helper T cells tumor immune microenvironment gene group , the gene module : CXCR5 , B3GAT1, ICOS , CD40LG , CD84 , IL21 , angiogenesis group , and the gene fibroblasts group . BCL6 , MAF , and SAP ; Protumor cytokines module : IL10 , [ 0424 ] In some embodiments, the gene groups associated TGFB1 , TGFB2 , TGFB3 , IL22 , MIF , TNFSF13B , IL6 , and with cancer malignancy comprises at least three genes from IL7 ; Regulatory T cells ( Treg ) module : TGFB1 , TGFB2 , the following group ( e.g. , at least three genes , at least four TGFB3 , FOXP3 , CTLA4 , IL10 , TNFRSF18 , and TNFR2 ; genes, at least five genes , at least six genes , at least seven Neutrophil signature model : MPO , ELANE , PRTN3, and genes, at least eight genes, at least nine genes, at least ten CTSG ; M2 signature module: IL10 , VEGFA , TGFB1 , genes, or more than ten genes are selected from each group ; IDO1 , PTGES , MRC1, CSF1 , LRP1 , ARG ) , PTGS1 , in some embodiments all of the listed genes are selected MSR1 , CD163 , and CSF1R ; Th2 signature module : IL4 , from each group ): the tumor properties group : MK167 , IL5 , IL13 , IL10 , IL25 , and GATA3; Complement inhibition ESCO2 , CETN3 , CDK2 , CCND1 , CCNEI, AURKA , module : CFD , CFI , CD55 , CD46 , CR1 , and CD59 ; Fibro AURKB , CDK4 , CDK6 , PRC1 , E2F1 , MYBL2 , BUB1 , blastic reticular cells module : DES , VIM , PDGFRA , PDPN , PLK1 , CCNB1 , MCM2 , MCM6 , PIK3CA , PIK3CB , NT5E , THY1 , ENG , ACTA2, LTBR , TNFRSF1A , VCAMI , PIK3CG , PIK3CD , AKT1 , MTOR , PTEN , PRKCA , AKT2 , ICAMI , and BST1 ; Angiogenesis module: VEGFA , AKT3 , BRAF, FNTA , FNTB , MAP2K1 , MAP2K2, VEGFB , VEGFC , PDGFC , CXCLS , CXCR2 , FLT1, PIGF , MKNK1, MKNK2 , ALK , AXL , KIT , EGFR , ERBB2 , FLT3 , CXCL5 , KDR , ANGPT1, ANGPT2, TEK , VWF , and MET, NTRK1 , FGFR1 , FGFR2 , FGFR3 , ERBB4 , ERBB3 , CDH5 ; Blood endothelium module: VEGFA , NOS3 , KDR , BCR - ABL , PDGFRA , PDGFRB , NGF , CSF3 , CSF2 , FGF7 , FLT1, VCAMI , VWF , CDH5 , and MMRN1; Proliferation IGF1 , IGF2 , IL7 , FGF2 , TP53 , SIK1 , PTEN , DCN , MTAP, rate ( or Tumor proliferation rate ) module : MKI67 , ESCO2 , AIM2 , RB1 , ESRP1, CTSL , HOXA1 , SMARCA4, SNAI2 , CETN3 , CDK2 , CCND1 , CCNE1 , AURKA , AURKB , TWISTI , NEDDY , PAPPA , HPSE , KISS1 , ADGRGI , E2F1 , MYBL2, BUB1 , PLK1 , CCNB1 , MCM2 , and BRMS1 , TCF21 , CDH1 , PCDH10 , NCAM1 , MITF , APC , MCM6 ; Oncogenes module : MDM2 , MYC , AKT1 , BCL2 , ARIDIA , ATM , ATRX , BAP1 , BRAF, BRCA2 , CDHI , MME , and SYK ; and Tumor suppressors module : TP53 , CDKN2A , CTCF , CTNNB1, DNMT3A , EGFR , FBXW7 , MLL2 , CREBBP, EP300 , ARID1A , HIST1H1 , EBF1 , IRF4 , FLT3 , GATA3, HRAS , IDH1 , KRAS, MAP3K1, MTOR , IKZF3 , KLHL6 , PRDM1 , CDKN2A , RB1 , EPHA7 , NAV3, NCOR1 , NF1 , NOTCH1 , NPM1 , NRAS, PBRM1 , TNFAIP3 , TNFRSF14 , FAS, SHP1 , and SOCS1 . In some PIK3CA , PIK3R1 , PTEN , RB1 , RUNX1 , SETD2 , STAG2, embodiments, the gene groups of the modules may further TAF1, TP53 , and VHL . US 2021/0005284 A1 Jan. 7 , 2021 47

[ 0425 ] In some embodiments , the gene groups associated B cells group , the anti - tumor microenvironment group , the with cancer microenvironment includes at least three genes checkpoint inhibition group , the Treg group , the MDSC from each of the following groups ( e.g. , at least three genes , group , the granulocytes group , and the tumor -promotive at least four genes , at least five genes, at least six genes, at immune group . least seven genes , at least eight genes, at least nine genes , at [ 0427 ] In some embodiments, the gene groups associated least ten genes, or more than ten genes are selected from with cancer malignancy comprises at least three genes from each group ; in some embodiments all of the listed genes are each of the following groups ( e.g. , at least three genes , at selected from each group ): the anti - tumor immune microen least four genes, at least five genes , at least six genes, at least vironment group : HLA - A , HLA - B , HLA - C , B2M , TAP1, seven genes , at least eight genes , at least nine genes , at least TAP2, HLA -DRA , HLA - DRB1 , HLA - DOB , HLA - DPB2 , ten genes , or more than ten genes are selected from each HLA - DMA , HLA - DOA , HLA -DPA1 , HLA - DPB1 , HLA group ): the proliferation rate group : MK167 , ESCO2 , DMB , HLA - DQB1 , HLA -DQA1 , HLA - DRB5 , HLA CETN3 , CDK2 , CCND1 , CCNEI , AURKA , AURKB , DQA2 , HLA -DQB2 , HLA - DRB6 , CD80 , CD86 , CD40 , CDK4 , CDK6 , PRC1 , E2F1 , MYBL2 , BUB1 , PLK1 , CD83 , TNFRSF4 , ICOSLG , CD28 , IFNG , GZMA , GZMB , CCNB1 , MCM2 , and MCM6 ; the PI3K / AKT mTOR/ sig PRF1 , LCK , GZMK , ZAP70 , GNLY, FASLG , TBX21 , naling group : PIK3CA , PIK3CB , PIK3CG , PIK3CD , AKTI , EOMES , CD8A , CD8B , NKG7 , CD160 , CD244 , NCR1 , MTOR , PTEN , PRKCA , AKT2 , and AKT3 ; the RAS /RAF / KLRC2 , KLRK1 , CD226 , GZMH , GNLY, IFNG , MEK signaling group : BRAF, FNTA , FNTB , MAP2K1, KIR2DL4 , KIR2DS1 , KIR2DS2 , KIR2DS3 , KIR2DS4 , MAP2K2, MKNK1, and MKNK2 ; the receptor tyrosine KIR2DS5 , CXCL9 , CXCL10 , CXCR3 , CX3CL1 , CCR7 , kinases expression group : ALK , AXL , KIT , EGFR , ERBB2 , CXCL11 , CCL21 , CCL2 , CCL3 , CCL4 , CCL5 , EOMES , FLT3 , MET, NTRK1 , FGFR1 , FGFR2 , FGFR3 , ERBB4 , TBX21 , ITK , CD3D , CD3E , CD3G , TRAC , TRBC1 , ERBB3 , BCR - ABL , PDGFRA , and PDGFRB ; the tumor TRBC2 , LCK , UBASH3A , TRAT1 , CD19 , MS4A1 , suppressors group : TP53 , SIK1 , PTEN , DCN , MTAP, TNFRSF13C , CD27 , CD24 , CR2 , TNFRSF17 , AIM2 , and RB1 ; the metastasis signature group : ESRP1 , TNFRSF13B , CD22 , CD79A , CD79B , BLK , NOS2 , CTSL , HOXA1 , SMARCA4, SNAI2 , TWISTI , NEDD9 , IL12A , IL12B , IL23A , TNF , ILIB , SOCS3 , IFNG , IL2 , PAPPA , and HPSE ; the anti- metastatic factors group : CD4OLG , IL15 , CD27 , TBX21 , LTA , IL21 , HMGB1 , TNF , KISS1 , ADGRG1 , BRMS1 , TCF21 , CDH1 , PCDH10 , IFNB1 , IFNA2, CCL3 , TNFSF10 , and FASLG ; the tumor NCAM1 , and MITF ; and the mutation status group : APC , promoting immune microenvironment group : PDCD1 , ARIDIA , ATM , ATRX , BAP1 , BRAF , BRCA2 , CDHI , CD274 , CTLA4 , LAG3 , PDCD1LG2 , BTLA , HAVCR2, CDKN2A , CTCF , CTNNB1, DNMT3A , EGFR , FBXW7 , VSIR , CXCL12 , TGFB1 , TGFB2 , TGFB3 , FOXP3 , FLT3 , GATA3, HRAS , IDH1 , KRAS, MAP3K1, MTOR , CTLA4 , IL10 , TNFRSF1B , CCL17 , CXCR4 , CCR4 , NAV3 , NCOR1 , NF1 , NOTCH1 , NPM1 , NRAS , PBRMI , CCL22 , CCL1 , CCL2 , CCL5 , CXCL13 , CCL28 , IDOL PIK3CA , PIK3R1 , PTEN , RB1 , RUNX1, SETD2 , STAG2, ARGI , IL4R , IL10 , TGFB1 , TGFB2 , TGFB3 , NOS2 , TAF1 , TP53 , and VHL . CYBB , CXCR4 , CD33 , CXCL1 , CXCL5 , CCL2 , CCL4 , [ 0428 ] In some embodiments, the gene groups associated CCLS , CCR2 , CCL3 , CCL5 , CSF1 , CXCL8 , CXCL8 , with cancer microenvironment comprises at least three CXCL2 , CXCL1 , CCL11 , CCL24 , KITLG , CCL5 , CXCL5 , genes from each of the following groups ( e.g. , at least three CCR3 , CCL26 , PRG2 , EPX , RNASE2, RNASE3, IL5RA , genes, at least four genes , at least five genes, at least six GATAL, SIGLECS , PRG3 , CMA1, TPSAB1 , MS4A2 , genes, at least seven genes , at least eight genes , at least nine CPA3, IL4 , IL5 , IL13 , SIGLEC8 , MPO , ELANE , PRTN3, genes, at least ten genes , or more than ten genes are selected CTSG , IL10 , VEGFA , TGFB1 , IDOL PTGES , MRC1 , from each group ): the cancer associated fibroblasts group : CSF1 , LRP1 , ARG1 , PTGS1 , MSR1 , CD163 , CSFIR , IL4 , LGALS1 , COL1A1 , COL1A2 , COL4A1 , COL5A1 , IL5 , IL13 , IL10 , IL25 , GATA3, IL10 , TGFB1 , TGFB2 , TGFB1 , TGFB2 , TGFB3 , ACTA2, FGF2 , FAP, LRP1 , TGFB3 , IL22 , MIF , CFD , CFI , CD55 , CD46 , and CR1 ; the CD248 , COL6A1 , COL6A2 , and COL6A3 ; the angiogen fibroblasts group : LGALS1, COL1A1 , COL1A2 , COL4A1 , esis group : VEGFA , VEGFB , VEGFC , PDGFC , CXCLS , COL5A1 , TGFB1 , TGFB2 , TGFB3 , ACTA2, FGF2 , FAP, CXCR2 , FLT1, PIGF , CXCL5 , KDR , ANGPT1, ANGPT2 , LRP1 , CD248 , COL6A1 , COL6A2 , and COL6A3 ; and the TEK , VWF , CDH5 , NOS3 , KDR , VCAMI , MMRN1, angiogenesis group : VEGFA , VEGFB , VEGFC , PDGFC , LDHA , HIF1A , EPASI, CAO , SPP1 , LOX , SLC2A1 , and CXCLS , CXCR2 , FLT1 , PIGF , CXCL5 , KDR , ANGPT1, LAMP3 ; the antigen presentation group : HLA - A , HLA - B , ANGPT2, TEK , VWF, CDH5 , NOS3 , KDR , VCAMI , HLA - C , B2M , TAP1, TAP2, HLA -DRA , HLA - DRB1 , MMRN1, LDHA , HIF1A , EPASI , CAI , SPP1 , LOX , HLA - DOB , HLA - DPB2 , HLA - DMA, HLA - DOA , HLA SLC2A1 , and LAMPS . In some embodiments, an unequal DPA1, HLA - DPB1 , HLA - DMB , HLA - DQB1 , HLA number of genes may be selected from each of the listed DQA1 , HLA - DRB5 , HLA - DQA2 , HLA - DQB2 , HLA groups for use . In specific embodiments, all or almost all of DRB6 , CD80 , CD86 , CD40 , CD83 , TNFRSF4 , ICOSLG , the listed genes are used . and CD28 ; the cytotoxic T and NK cells group : IFNG , [ 0426 ] In some embodiments , gene groups associated with GZMA , GZMB , PRF1 , LCK , GZMK , ZAP70 , GNLY, cancer malignancy are : the proliferation rate group , the FASLG , TBX21 , EOMES , CD8A , CD8B , NKG7 , CD160 , PI3K / AKT /mTOR signaling group , the RAS /RAF /MEK CD244 , NCR1 , KLRC2 , KLRK1 , CD226 , GZMH , GNLY, signaling group , the receptor tyrosine kinases expression IFNG , KIR2DL4 , KIR2DS1 , KIR2DS2 , KIR2DS3, group , the tumor suppressors group , the metastasis signature KIR2DS4 , KIR2DS5 , CXCL9 , CXCL10 , CXCR3 , group , the anti -metastatic factors group, and the mutation CX3CL1 , CCR7 , CXCL11 , CCL21 , CCL2 , CCL3 , CCL4 , status group . In some embodiments , the gene groups asso CCL5 , EOMES , TBX21 , ITK , CD3D , CD3E , CD3G , ciated with cancer microenvironment are : the cancer asso TRAC , TRBC1 , TRBC2 , LCK , UBASHBA , and TRAT1 ; ciated fibroblasts group , the angiogenesis group , the antigen the B cells group : CD19 , MS4A1 , TNFRSF13C , CD27 , presentation group , the cytotoxic T and NK cells group , the CD24 , CR2 , TNFRSF17 , TNFRSF13B , CD22 , CD79A , US 2021/0005284 A1 Jan. 7 , 2021 48

CD79B , and BLK ; the anti - tumor microenvironment group : TWIST1 , NEDD9 , PAPPA , and HPSE ; the anti -metastatic NOS2 , IL12A , IL12B , IL23A , TNF , ILIB , SOCS3 , IFNG , factors group : KISS1 , ADGRG1 , BRMS1 , TCF21 , CDH1 , IL2 , CD40LG , IL15 , CD27 , TBX21 , LTA , IL21 , HMGB1 , PCDH10 , NCAM1 , and MITF ; and the mutation status TNF , IFNB1 , IFNA2, CCL3 , TNFSF10 , and FASLG ; the group : APC , ARIDIA , ATM , ATRX , BAP1 , BRAF, checkpoint inhibition group : PDCD1 , CD274 , CTLA4, BRCA2 , CDH1 , CDKN2A , CTCF , CTNNB1, DNMT3A , LAGS , PDCD1LG2 , BTLA , HAVCR2, and VSIR ; the Treg EGFR , FBXW7 , FLT3 , GATA3, HRAS, IDH1 , KRAS, group : CXCL12 , TGFB1 , TGFB2 , TGFB3 , FOXP3 , MAP3K1, MTOR , NAV3 , NCOR1 , NF1 , NOTCH1 , NPM1 , CTLA4 , IL10 , TNFRSF1B , CCL17 , CXCR4 , CCR4 , CCL22 , CCL1 , CCL2 , CCL5 , CXCL13 , and CCL28 ; the NRAS, PBRM1 , PIK3CA , PIK3R1 , PTEN , RB1 , RUNX1, MDSC group : IDO1 , ARGI , IL4R , IL10 , TGFB1 , TGFB2 , SETD2 , STAG2, TAF1, TP53 , and VHL . In some embodi TGFB3 , NOS2 , CYBB , CXCR4 , CD33 , CXCL1 , CXCL5 , ments, the plurality of gene groups associated with cancer CCL2 , CCL4 , CCLS , CCR2, CCL3 , CCL5 , CSF1 , and microenvironment comprises at least three genes from each CXCL8 ; the granulocytes group : CXCLS , CXCL2 , CXCL1 , of the following groups : the cancer associated fibroblasts CCL11 , CCL24 , KITLG , CCL5 , CXCL5 , CCR3 , CCL26 , group : LGALSI, COL1A1 , COL1A2 , COL4A1 , COL5A1, PRG2 , EPX , RNASE2, RNASE3 , IL5RA , GATAI, TGFB1 , TGFB2 , TGFB3 , ACTA2, FGF2 , FAP, LRP1 , SIGLEC8 , PRG3 , CMA1, TPSAB1 , MS4A2 , CPA3, IL4 , CD248 , COL6A1 , COL6A2 , and COL6A3 ; the angiogen IL5 , IL13 , SIGLEC8 , MPO , ELANE, PRTN3, and CTSG ; esis group : VEGFA , VEGFB , VEGFC , PDGFC , CXCL8 , the tumor - promotive immune group : IL10 , VEGFA , CXCR2 , FLT1, PIGF , CXCL5 , KDR , ANGPT1, ANGPT2 , TGFB1 , IDOI , PTGES , MRC1 , CSF1 , LRP1 , ARGI , TEK , VWF , CDH5 , NOS3 , KDR , VCAM1 , MMRN1, PTGS1 , MSR1 , CD163 , CSFIR , IL4 , IL5 , IL 13 , IL10 , IL25 , LDHA , HIF1A , EPAS1, CAS , SPP1 , LOX , SLC2A1 , and GATA3 , IL10 , TGFB1 , TGFB2 , TGFB3 , IL22 , MIF , CFD , LAMPS ; the MHCI group : HLA - A , HLA - B , HLA - C , B2M , CFI , CD55 , CD46 , and CR1 . In some embodiments, an TAP1, and TAP2; the MHCII group : HLA - DRA , HLA unequal number of genes may be selected from each of the DRB1 , HLA - DOB , HLA - DPB2 , HLA - DMA , HLA - DOA , listed groups for use . In specific embodiments , all or almost HLA -DPA1 , HLA - DPB1 , HLA - DMB , HLA - DQB1 , HLA all of the listed genes are used . DQA1 , HLA - DRB5 , HLA - DQA2 , HLA - DQB2 , and HLA [ 0429 ] In some embodiments, the gene groups associated DRB6 ; the coactivation molecules group : CD80 , CD86 , with cancer malignancy are : the proliferation rate group , the CD40 , CD83 , TNFRSF4 , ICOSLG , and CD28 ; the effector PI3K / AKT/mTOR signaling group , the RAS /RAF MEK/ cells group : IFNG , GZMA, GZMB , PRF1 , LCK , GZMK , signaling group , the receptor tyrosine kinases expression ZAP70 , GNLY, FASLG , TBX21 , EOMES , CD8A , and group , the growth factors group , the tumor suppressors CD8B ; the NK cells group : NKG7 , CD160 , CD244 , NCR1 , group , the metastasis signature group , the anti -metastatic KLRC2 , KLRK1 , CD226 , GZMH , GNLY, IFNG , factors group , and the mutation status group . In some KIR2DL4 , KIR2DS1 , KIR2DS2 , KIR2DS3 , KIR2DS4 , and embodiments, the plurality of gene groups associated with KIR2DS5 ; the T cell traffic group : CXCL9 , CXCL10 , cancer microenvironment are : the cancer associated fibro CXCR3 , CX3CL1 , CCR7 , CXCL11 , CCL21 , CCL2 , CCL3 , blasts group , the angiogenesis group , the MHCI group , the CCL4 , and CCL5 ; the T cells group : EOMES , TBX21 , ITK , MHCII group , the coactivation molecules group , the effector CD3D , CD3E , CD3G , TRAC , TRBC1 , TRBC2 , LCK , cells group , the NK cells group , the T cell traffic group , the UBASH3A , and TRAT1; the B cells group : CD19 , MS4A1 , T cells group , the B cells group , the M1 signatures group, the TNFRSF13C , CD27 , CD24 , CR2, TNFRSF17 , Th1 signature group , the antitumor cytokines group , the TNFRSF13B , CD22 , CD79A , CD79B , and BLK ; the M1 checkpoint inhibition group , the Treg group , the MDSC signatures group : NOS2 , IL12A , IL12B , IL23A , TNF, IL1B , group , the granulocytes group , the M2 signature group , the and SOCS3 ; the Th1 signature group : IFNG , IL2 , CD40LG , Th2 signature group , the protumor cytokines group , and the IL15 , CD27 , TBX21 , LTA , and IL21 ; the antitumor cytok complement inhibition group . ines group : HMGB1 , TNF, IFNB1 , IFNA2, CCL3 , [ 0430 ] In some embodiments, the gene groups associated TNFSF10 , and FASLG ; the checkpoint inhibition group : with cancer malignancy comprises at least three genes from PDCD1 , CD274 , CTLA4, LAG3 , PDCD1LG2 , BTLA , each of the following groups ( e.g. , at least three genes , at HAVCR2 , and VSIR ; the Treg group : CXCL12 , TGFB1 , least four genes, at least five genes, at least six genes, at least TGFB2 , TGFB3 , FOXP3 , CTLA4 , IL10 , TNFRSF1B , seven genes , at least eight genes, at least nine genes , at least CCL17 , CXCR4 , CCR4 , CCL22 , CCL1 , CCL2 , CCL5 , ten genes , or more than ten genes are selected from each CXCL13 , and CCL28 ; the MDSC group : IDO1 , ARG1 , group ): the proliferation rate group : MKI67 , ESCO2 , IL4R , IL10 , TGFB1 , TGFB2 , TGFB3 , NOS2 , CYBB , CETN3 , CDK2 , CCND1 , CCNE1 , AURKA , AURKB , CXCR4 , CD33 , CXCL1 , CXCL5 , CCL2 , CCL4 , CCL8 , CDK4 , CDK6 , PRC1 , E2F1 , MYBL2, BUB1 , PLK1 , CCR2 , CCL3 , CCL5 , CSF1 , and CXCL8 ; the granulocytes CCNB1 , MCM2 , and MCM6 ; the PI3K / AKT /mTOR sig group : CXCL8 , CXCL2 , CXCL1 , CCL11 , CCL24 , KITLG , naling group: PIK3CA , PIK3CB , PIK3CG , PIK3CD , AKTI , CCL5 , CXCL5 , CCR3 , CCL26 , PRG2 , EPX , RNASE2, MTOR , PTEN , PRKCA , AKT2 , and AKT3 ; the RAS /RAF / RNASE3 , IL5RA , GATAL, SIGLEC8 , PRG3 , CMA1, MEK signaling group : BRAF , FNTA , FNTB , MAP2K1 , TPSAB1 , MS4A2 , CPA3, IL4 , IL5 , IL13 , SIGLEC8 , MPO , MAP2K2, MKNK1, and MKNK2 ; the receptor tyrosine ELANE , PRTN3, and CTSG ; the M2 signature group : IL10 , kinases expression group : ALK , AXL , KIT, EGFR , ERBB2 , VEGFA , TGFB1 , IDOL PTGES , MRC1 , CSF1 , LRP1 , FLT3 , MET, NTRK1 , FGFR1 , FGFR2 , FGFR3 , ERBB4 , ARG1 , PTGS1 , MSR1 , CD163 , and CSF1R ; the Th2 sig ERBB3 , BCR - ABL , PDGFRA , and PDGFRB ; the growth nature group : IL4 , IL5 , IL13 , IL10 , IL25 , and GATA3 ; the factors group : NGF, CSF3 , CSF2 , FGF7 , IGF1 , IGF2 , IL7 , protumor cytokines group : IL10 , TGFB1 , TGFB2 , TGFB3 , and FGF2 ; the tumor suppressors group : TP53 , SIK1 , IL 22 , and MIF ; and the complement inhibition group : CFD , PTEN , DCN , MTAP, AIM2 , and RB1 ; the metastasis sig CFI , CD55 , CD46 , and CR1 . In some embodiments, an nature group : ESRP1 , CTSL , HOXA1 , SMARCA4, SNAI2 , unequal number of genes may be selected from each of the US 2021/0005284 A1 Jan. 7. 2021 49 listed groups for use . In specific embodiments, all or almost fibroblasts in a cancer ( e.g. , a tumor ). In some embodiments, all of the listed genes are used . fibroblast enriched tumors comprise high levels of fibroblast [ 0431 ] A molecular functional expression signature may cells . include any suitable number of gene groups . In some embodiments, an MFES comprises at least 2 , at least 3 , at Predicting Therapy Response least 4 , at least 5 , at least 6 , at least 7 , at least 8 , at least 9 , [ 0435 ] In some embodiments, sequencing data obtained at least 10 , at least 11 , at least 12 , at least 13 , at least 14 , at using systems and methods described herein ( e.g. , bias least 15 , at least 16 , at least 17 , at least 18 , at least 19 , at least corrected gene expression data , data processed using the 20 , at least 21 , at least 22 , at least 23 , at least 24 , at least 25 , quality control techniques described herein , etc.) may be at least 26 , at least 27 , or at least 28 modules . In some used for identifying subjects suitable for a particular treat embodiments, an MFES comprises up to 2 , up to 3 , up to 4 , ment, and / or predicting likelihood of a patient's response or up to 5 , up to 6 , up to 7 , up to 8 , up to 9 , up to 10 , up to 11 , lack thereof to a particular treatment and / or predicting up to 12 , up to 13 , up to 14 , up to 15 , up to 16 , up to 17 , up whether a patient may or may not have one or more adverse to 18 , up to 19 , up to 20 , up to 21 , up to 22 , up to 23 , up to reactions to a particular therapy as described in International 24 , up to 25 , up to 26 , up to 27 , or up to 28 gene groups. PCT Publication WO2018 / 231771 , published on Dec. 20 , 2018 , entitled “ Systems and Methods for Generating, Visu Tumor Microenvironment Types alizing and Classifying Molecular Functional Profiles , ” (be ing a publication of PCT Application No .: PCT /US2018 / [ 0432 ] The inventors have recognized that a molecular 037017 , filed Jun . 12 , 2018 ) , the entire contents of which are functional expression signature for a subject having, sus incorporated herein by reference . pected of having, or at risk of having cancer may provide [ 0436 ] In some embodiments, sequencing data obtained as valuable information about the microenvironment of the described herein ( e.g. , bias - corrected gene expression data , subject's cancer. The inventors have recognized that a data processed using the quality control techniques subject's MFES may be used to classify the subject's described herein , etc. ) is useful for identifying a subject microenvironment as being one of multiple types. For suitable for a particular treatment. In some embodiments, example , in some embodiments , the MFES may be used to sequencing data ( e.g. , bias - corrected gene expression data , classify the subject's microenvironment as being one of four data processed using the quality control techniques different types of microenvironment ( e.g. , “ 1st MF profile ” described herein , etc. ) obtained as described herein is useful or " type A ” microenvironment, “ 2nd MF profile ” or “ type for predicting likelihood of a patient's response or lack B ” microenvironment, “ 3rd MF profile ” or “ type C ” thereof to a particular treatment . In some embodiments , microenvironment, “ 4th MF profile ” or “ type D ” microen sequencing data obtained as described herein ( e.g. , bias vironment, which are described in International PCT Pub corrected gene expression data , data processed using the lication WO2018 / 231771 , which is incorporated by refer quality control techniques described herein , etc.) is useful ence herein in its entirety ). In turn , the identified for predicting whether a patient may or may not have one or microenvironment type be used to identify a cancer therapy more adverse reactions to a particular therapy. and / or determine the effectiveness ( or lack thereof) for one [ 0437 ] In some embodiments, predicted efficacy of an or more cancer therapies. Examples of identifying cancer immune checkpoint blockade therapy may be determined therapies based on a type of cancer microenvironment ( e.g. , using sequencing data obtained as described herein ( e.g. , determined from gene group expression data, for example , bias - corrected gene expression data, data processed using part of a molecular functional expression signature or a the quality control techniques described herein , etc.) as molecular functional profile ) are described in International described in International PCT Publication WO2018 / PCT Publication WO2018 / 231771 . 231772 , published on Dec. 20 , 2018 , entitled “ Systems and [ 0433 ] First MF profile cancers may also be described as Methods for Identifying Responders and Non -Responders to “ inflamed / vascularized ” and / or “ inflamed / fibroblast- en Immune Checkpoint Blockade Therapy ” ( being a publica riched ” ; Second MF profile cancers may also be described as tion of International patent application number PCT / " inflamed /non - vascularized and / or “ inflamed /non - fibro US2018 / 037018 , filed Jun . 12 , 2018 ) , the entire contents of blast - enriched ” ; Third MF profile cancers may also be which are incorporated herein by reference . described as “ non - inflamed / vascularized ” and / or “ non - in [ 0438 ] In some embodiments, sequencing data obtained as flamed / fibroblast - enriched ” ; and Fourth MF profile cancers described herein ( e.g. , bias - corrected gene expression data , may also be described as “ non - inflamed /non - vascularized ” data processed using the quality control techniques and / or " non - inflamed / non - fibroblast - enriched ” and /or described herein , etc. ) is useful for determining a biomarker , " immune desert .>> ” a biomarker score , a normalized biomarker score , a therapy score , and / or an impact score as described in International [ 0434 ] In some embodiments, “ inflamed ” refers to the PCT Publication WO2018 / 231762 , published on Dec. 20 , level of compositions and processes related to inflammation 2018 , entitled “ Systems and Methods for Identifying Cancer in a cancer ( e.g. , a tumor ). In some embodiments , inflamed Treatments from Normalized Biomarker Scores ” (being a cancers ( e.g. , tumors ) are highly infiltrated by immune cells , publication of International patent application number PCT / and are highly active with regard to antigen presentation and US2018 / 037008 , filed Jun . 12 , 2018 ) , the entire contents of T - cell activation . In some embodiments , “ vascularized " which are incorporated herein by reference . refers to the formation of blood vessels in a cancer ( e.g. , a tumor ). In some embodiments, vascularized cancers ( e.g. , tumors ) comprise high levels of cellular compositions and Methods of Treatment process related to blood vessel formation . In some embodi [ 0439 ] In certain methods described herein , an effective ments , “ fibroblast enriched ” refers to the level or amount of amount of anti - cancer therapy described herein may be US 2021/0005284 A1 Jan. 7 , 2021 50 administered or recommended for administration to a sub or more doses of the anti - cancer therapeutic agent. Individu ject ( e.g. , a human ) in need of the treatment via a suitable als may be administered incremental dosages of the anti route ( e.g. , intravenous administration ). cancer therapeutic agent. To assess efficacy of an adminis [ 0440 ] The subject to be treated by the methods described tered anti- cancer therapeutic agent, one or more aspects of herein may be a human patient having , suspected of having , a cancer ( e.g. , tumor formation , tumor growth , tumor type , or at risk for a cancer. Examples of a cancer include , but are MF expression signature ) may be analyzed . not limited to , melanoma, lung cancer , brain cancer, breast [ 0445 ) Generally, for administration of any of the anti cancer , colorectal cancer, pancreatic cancer , liver cancer , cancer antibodies described herein , an initial candidate dos prostate cancer , skin cancer, kidney cancer, bladder cancer, age may be about 2 mg / kg . For the purpose of the present or prostate cancer . The subject to be treated by the methods disclosure , a typical daily dosage might range from about described herein may be a mammal ( e.g. , may be a human ). any of 0.1 ug /kg to 3 ug /kg to 30 ug /kg to 300 ug /kg to 3 Mammals include but are not limited to : farm animals ( e.g. , mg /kg , to 30 mg /kg to 100 mg/ kg or more , depending on the livestock ) , sport animals, laboratory animals, pets , primates, factors mentioned above . For repeated administrations over horses , dogs , cats , mice , and rats . several days or longer, depending on the condition , the [ 0441 ] A subject having a cancer may be identified by treatment is sustained until a desired suppression or ame routine medical examination , e.g. , laboratory tests , biopsy, lioration of symptoms occurs or until sufficient therapeutic PET scans, CT scans , or ultrasounds. A subject suspected of levels are achieved to alleviate a cancer, or one or more having a cancer might show one or more symptoms of the symptoms thereof. An exemplary dosing regimen comprises disorder, e.g. , unexplained weight loss , fever, fatigue, cough , administering an initial dose of about 2 mg/ kg , followed by pain , skin changes, unusual bleeding or discharge , and / or a weekly maintenance dose of about 1 mg/ kg of the anti thickening or lumps in parts of the body. A subject at risk for body, or followed by a maintenance dose of about 1 mg/ kg a cancer may be a subject having one or more of the risk every other week . However, other dosage regimens may be factors for that disorder. For example , risk factors associated useful, depending on the pattern of pharmacokinetic decay with cancer include, but are not limited to , ( a ) viral infection that the practitioner ( e.g. , a medical doctor) wishes to ( e.g. , herpes virus infection ) , ( b ) age , ( c ) family history, ( d ) achieve . For example, dosing from one - four times a week is heavy alcohol consumption , ( e ) obesity, ( f) genetics, and ( g ) contemplated. In some embodiments , dosing ranging from chemical or toxin exposure , and ( h ) tobacco use . about 3 ug /mg to about 2 mg /kg ( such as about 3 ug /mg , [ 0442 ] “ An effective amount” as used herein refers to the about 10 ug /mg , about 30 ug /mg , about 100 ug /mg , about amount of each active agent required to confer therapeutic 300 ug /mg , about 1 mg/ kg , and about 2 mg/ kg ) may be used . effect on the subject, either alone or in combination with one In some embodiments , dosing frequency is once every week , or more other active agents. Effective amounts vary, as every 2 weeks, every 4 weeks , every 5 weeks, every 6 recognized by those skilled in the art, depending on the weeks, every 7 weeks , every 8 weeks , every 9 weeks, or particular condition being treated , the severity of the con every 10 weeks ; or once every month , every 2 months, or dition , the individual patient parameters including age , every 3 months, or longer. The progress of this therapy may physical condition , size , gender and weight, the duration of be monitored by conventional techniques and assays and / or the treatment, the nature of concurrent therapy ( if any ), the by monitoring cancer Types A - D as described herein . The specific route of administration and like factors within the dosing regimen ( including the therapeutic used ) may vary knowledge and expertise of the health practitioner. These over time. factors are well known to those of ordinary skill in the art [ 0446 ] When the anti - cancer therapeutic agent is not an and can be addressed with no more than routine experimen antibody, it may be administered at the rate of about 0.1 to tation . It is generally preferred that a maximum dose of the 300 mg/ kg of the weight of the patient divided into one to individual components or combinations thereof be used , that three doses , or as described herein . In some embodiments , is , the highest safe dose according to sound medical judg for an adult patient of normal weight, doses ranging from ment . It will be understood by those of ordinary skill in the about 0.3 to 5.00 mg /kg may be administered . The particular art, however, that a patient may insist upon a lower dose or dosage regimen , e.g. , dose , timing, and / or repetition , will tolerable dose for medical reasons , psychological reasons , or depend on the particular subject and that individual's medi for virtually any other reasons . cal history, as well as the properties of the individual agents [ 0443 ] Empirical considerations , such as the half - life of a ( such as the half -life of the agent, and other considerations therapeutic compound , generally contribute to the determi well known in the art) . nation of the dosage. For example, antibodies that are [ 0447 ] For the purpose of the present disclosure , the compatible with the human immune system , such as human appropriate dosage of an anti- cancer therapeutic agent will ized antibodies or fully human antibodies, may be used to depend on the specific anti -cancer therapeutic agent ( s) ( or prolong half -life of the antibody and to prevent the antibody compositions thereof) employed , the type and severity of being attacked by the host's immune system . Frequency of cancer, whether the anti- cancer therapeutic agent is admin administration may be determined and adjusted over the istered for preventive or therapeutic purposes , previous course of therapy, and is generally ( but not necessarily ) therapy, the patient's clinical history and response to the based on treatment, and / or suppression , and / or amelioration , anti - cancer therapeutic agent, and the discretion of the and / or delay of a cancer . Alternatively , sustained continuous attending physician . Typically the clinician will administer release formulations of an anti - cancer therapeutic agent may an anti - cancer therapeutic agent, such as an antibody, until be appropriate . Various formulations and devices for achiev a dosage is reached that achieves the desired result . ing sustained release are known in the art . [ 0448 ] Administration of an anti - cancer therapeutic agent [ 0444 ] In some embodiments, dosages for an anti - cancer can be continuous or intermittent, depending, for example, therapeutic agent as described herein may be determined upon the recipient's physiological condition, whether the empirically in individuals who have been administered one purpose of the administration is therapeutic or prophylactic, US 2021/0005284 A1 Jan. 7 , 2021 51 and other factors known to skilled practitioners . The admin cally, vaginally or via an implanted reservoir. The term istration of an anti - cancer therapeutic agent ( e.g. , an anti “ parenteral” as used herein includes subcutaneous , intracu cancer antibody ) may be essentially continuous over a taneous , intravenous, intramuscular, intraarticular, intraarte preselected period of time or may be in a series of spaced rial, intrasynovial, intrasternal, intrathecal, intralesional, and dose , e.g. , either before , during , or after developing cancer . intracranial injection or infusion techniques . In addition , an [ 0449 ] As used herein , the term “ treating ” refers to the anti -cancer therapeutic agent may be administered to the application or administration of a composition including one subject via injectable depot routes of administration such as or more active agents to a subject, who has a cancer, a using 1- , 3- , or 6 - month depot injectable or biodegradable symptom of a cancer, or a predisposition toward a cancer , materials and methods . with the purpose to cure , heal, alleviate , relieve , alter, [ 0454 ] Injectable compositions may contain various car remedy, ameliorate, improve , or affect the cancer or one or riers such as vegetable oils , dimethylactamide, dimethyfor more symptoms of the cancer, or the predisposition toward mamide , ethyl lactate, ethyl carbonate , isopropyl myristate , a cancer . ethanol, and polyols ( e.g. , glycerol, propylene glycol , liquid [ 0450 ] Alleviating a cancer includes delaying the devel polyethylene glycol , and the like ). For intravenous injection , opment or progression of the disease , or reducing disease water soluble anti - cancer therapeutic agents can be admin severity. Alleviating the disease does not necessarily require istered by the drip method, whereby a pharmaceutical for curative results . As used therein , " delaying ” the develop mulation containing the antibody and a physiologically ment of a disease ( e.g. , a cancer) means to defer, hinder, acceptable excipients is infused . Physiologically acceptable slow , retard , stabilize , and /or postpone progression of the excipients may include , for example , 5 % dextrose , 0.9 % disease . This delay can be of varying lengths of time , saline , Ringer's solution , and / or other suitable excipients. depending on the history of the disease and / or individuals Intramuscular preparations, e.g. , a sterile formulation of a being treated . A method that " delays ” or alleviates the suitable soluble salt form of the anti - cancer therapeutic development of a disease , or delays the onset of the disease , agent, can be dissolved and administered in a pharmaceu is a method that reduces probability of developing one or tical excipient such as Water - for - Injection , 0.9 % saline, more symptoms of the disease in a given time frame and / or and /or 5 % glucose solution . reduces extent of the symptoms in a given time frame, when [ 0455 ] In one embodiment, an anti - cancer therapeutic compared to not using the method . Such comparisons are agent is administered via site - specific or targeted local typically based on clinical studies, using a number of delivery techniques. Examples of site - specific or targeted subjects sufficient to give a statistically significant result. local delivery techniques include various implantable depot [ 0451 ] “ Development ” or “ progression ” of a disease sources of the agent or local delivery catheters, such as means initial manifestations and / or ensuing progression of infusion catheters, an indwelling catheter, or a needle cath the disease . Development of the disease can be detected and eter, synthetic grafts , adventitial wraps , shunts and stents or assessed using clinical techniques known in the art . Alter other implantable devices, site specific carriers, direct injec natively or in addition to the clinical techniques known in tion , or direct application . See , e.g. , PCT Publication No. the art, development of the disease may be detectable and WO 00/53211 and U.S. Pat . No. 5,981,568 , the contents of assessed based on the cancer types described herein . How each of which are incorporated by reference herein for this ever , development also refers to progression that may be purpose . undetectable . For purpose of this disclosure , development or [ 0456 ] Targeted delivery of therapeutic compositions con progression refers to the biological course of the symptoms. taining an antisense polynucleotide, expression vector , or “ Development ” includes occurrence , recurrence , and onset . subgenomic polynucleotides can also be used . Receptor As used herein “ onset ” or “ occurrence ” of a cancer includes mediated DNA delivery techniques are described in , for initial onset and / or recurrence . example, Findeis et al . , Trends Biotechnol. ( 1993 ) 11 : 202 ; [ 0452 ] In some embodiments, the anti - cancer therapeutic Chiou et al . , Gene Therapeutics : Methods And Applications agent ( e.g. , an antibody ) described herein is administered to Of Direct Gene Transfer ( I. A. Wolff, ed . ) ( 1994 ) ; Wu et al . , a subject in need of the treatment at an amount sufficient to J. Biol . Chem . ( 1988 ) 263 : 621 ; Wu et al . , J. Biol . Chem . reduce cancer ( e.g. , tumor) growth by at least 10 % ( e.g. , ( 1994 ) 269 : 542 ; Zenke et al . , Proc . Natl . Acad . Sci . USA 20 % , 30 % , 40 % , 50 % , 60 % , 70 % , 80 % , 90 % or greater ). In ( 1990 ) 87 : 3655 ; Wu et al . , J. Biol . Chem . ( 1991 ) 266 : 338 . some embodiments , the anti -cancer therapeutic agent ( e.g. , The contents of each of the foregoing are incorporated by an antibody ) described herein is administered to a subject in reference herein for this purpose . need of the treatment at an amount sufficient to reduce [ 0457 ] Therapeutic compositions containing a polynucle cancer cell number or tumor size by at least 10 % ( e.g. , 20 % , otide may be administered in a range of about 100 ng to 30 % , 40 % , 50 % , 60 % , 70 % , 80 % , 90 % or more ). In other about 200 mg of DNA for local administration in a gene embodiments, the anti - cancer therapeutic agent is adminis therapy protocol . In some embodiments, concentration tered in an amount effective in altering cancer type . Alter ranges of about 500 ng to about 50 mg , about 1 ug to about natively, the anti- cancer therapeutic agent is administered in 2 mg , about 5 ug to about 500 ug , and about 20 ug to about an amount effective in reducing tumor formation or metas 100 ug of DNA or more can also be used during a gene tasis . therapy protocol. [ 0453 ] Conventional methods, known to those of ordinary [ 0458 ] Therapeutic polynucleotides and polypeptides can skill in the art of medicine , may be used to administer the be delivered using gene delivery vehicles . The gene delivery anti - cancer therapeutic agent to the subject, depending upon vehicle can be of viral or non - viral origin ( e.g. , Jolly, Cancer the type of disease to be treated or the site of the disease . The Gene Therapy ( 1994 ) 1:51 ; Kimura , Human Gene Therapy anti- cancer therapeutic agent can also be administered via ( 1994 ) 5 : 845 ; Connelly, Human Gene Therapy ( 1995 ) other conventional routes , e.g. , administered orally, paren 1 : 185 ; and Kaplitt , Nature Genetics ( 1994 ) 6 : 148 ) . The terally, by inhalation spray , topically, rectally, nasally, buc contents of each of the foregoing are incorporated by US 2021/0005284 A1 Jan. 7. 2021 52 reference herein for this purpose . Expression of such coding tively or in addition to , treatment efficacy can be assessed by sequences can be induced using endogenous mammalian or monitoring tumor type over the course of treatment ( e.g. , heterologous promoters and / or enhancers. Expression of the before , during, and after treatment ). coding sequence can be either constitutive or regulated . [ 0459 ] Viral -based vectors for delivery of a desired poly Combination Therapy nucleotide and expression in a desired cell are well known [ 0463 ] Compared to monotherapies, combinations of in the art. Exemplary viral - based vehicles include , but are treatment approaches showed higher efficacy in many stud not limited to , recombinant retroviruses ( see , e.g. , PCT ies , but the choice of remedies to be combined and designing Publication Nos . WO 90/07936 ; WO 94/03622 ; WO the combination therapy regimen remain speculative . Given 93/25698 ; WO 93/25234 ; WO 93/11230 ; WO 93/10218 ; that the number of possible combinations is now extremely WO 91/02805 ; U.S. Pat . Nos . 5,219,740 and 4,777,127 ; GB high , there is great need for a tool that would help to select Patent No. 2,200,651 ; and EP Patent No. 0 345 242 ) , drugs and combinations of remedies based on objective alphavirus- based vectors ( e.g. , Sindbis virus vectors , Sem information about a particular patient. Use of patient specific liki forest virus (ATCC VR - 67 ; ATCC VR - 1247 ) , Ross information ( e.g. , a patient's sequencing data ) for designing River virus ( ATCC VR - 373 ; ATCC VR - 1246 ) and Venezu or electing a specific combination therapy establishes a elan equine encephalitis virus ( ATCC VR -923 ; ATCC scientific basis for choosing the optimal combination of VR - 1250 ; ATCC VR 1249 ; ATCC VR - 532 ) ) , and adeno preparations. associated virus ( AAV ) vectors ( see , e.g. , PCT Publication [ 0464 ] Also provided herein are methods of treating a Nos . WO 94/12649 , WO 93703769 ; WO 93/19191 ; WO cancer or recommending treating a cancer using any com 94/28938 ; WO 95/11984 and WO 95/00655 ) . Administra bination of anti - cancer therapeutic agents or one or more tion of DNA linked to killed adenovirus as described in anti- cancer therapeutic agents and one or more additional Curiel, Hum . Gene Ther . ( 1992 ) 3 : 147 can also be therapies ( e.g. , surgery and / or radiotherapy ). The term com employed . The contents of each of the foregoing are incor bination therapy, as used herein , embraces administration of porated by reference herein for this purpose . more than one treatment ( e.g. , an antibody and a small [ 040] Non - viral delivery vehicles and methods can also molecule or an antibody and radiotherapy ) in a sequential be employed , including , but not limited to , polycationic manner, that is , wherein each therapeutic agent is adminis condensed DNA linked or unlinked to killed adenovirus tered at a different time , as well as administration of these alone ( see , e.g. , Curiel, Hum . Gene Ther. ( 1992 ) 3 : 147 ) ; therapeutic agents, or at least two of the agents or therapies , ligand -linked DNA ( see , e.g. , Wu, J. Biol . Chem . ( 1989 ) in a substantially simultaneous manner . 264 : 16985 ) ; eukaryotic cell delivery vehicles cells ( see , e.g. , [ 0465 ] Sequential or substantially simultaneous adminis U.S. Pat . No. 5,814,482 , PCT Publication Nos . WO tration of each agent or therapy can be affected by any 95/07994 ; WO 96/17072 ; WO 95/30763 ; and WO appropriate route including , but not limited to , oral routes, 97/42338 ) and nucleic charge neutralization or fusion with intravenous routes , intramuscular, subcutaneous routes, and cell membranes. Naked DNA can also be employed . Exem direct absorption through mucous membrane tissues . The plary naked DNA introduction methods are described in agents or therapies can be administered by the same route or PCT Publication No. WO 90/11092 and U.S. Pat . No. by different routes . For example , a first agent ( e.g. , a small 5,580,859 . Liposomes that can act as gene delivery vehicles molecule ) can be administered orally, and a second agent are described in U.S. Pat . No. 5,422,120 ; PCT Publication ( e.g. , an antibody) can be administered intravenously. Nos . WO 95/13796 ; WO 94/23697 ; WO 91/14445 ; and EP [ 0466 ] As used herein , the term “ sequential ” means , Patent No. 0524968. Additional approaches are described in unless otherwise specified , characterized by a regular Philip , Mol . Cell . Biol . ( 1994 ) 14 : 2411 , and in Woffendin , sequence or order, e.g. , if a dosage regimen includes the Proc . Natl . Acad . Sci . ( 1994 ) 91 : 1581 . The contents of each administration of an antibody and a small molecule , a of the foregoing are incorporated by reference herein for this sequential dosage regimen could include administration of purpose . the antibody before, simultaneously, substantially simulta [ 0461 ] It is also apparent that an expression vector can be neously, or after administration of the small molecule , but used to direct expression of any of the protein - based anti both agents will be administered in a regular sequence or cancer therapeutic agents ( e.g. , anti - cancer antibody ). For order. The term " separate ” means , unless otherwise speci example, peptide inhibitors that are capable of blocking fied , to keep apart one from the other . The term “ simulta ( from partial to complete blocking ) a cancer causing bio neously ” means , unless otherwise specified , happening or logical activity are known in the art . done at the same time , i.e. , the agents of the invention are [ 0462 ] In some embodiments, more than one anti - cancer administered at the same time . The term “ substantially therapeutic agent, such as an antibody and a small molecule simultaneously ” means that the agents are administered inhibitory compound , may be administered to a subject in within minutes of each other ( e.g. , within 10 minutes of each need of the treatment. The agents may be of the same type other) and intends to embrace joint administration as well as or different types from each other. At least one , at least two , consecutive administration , but if the administration is con at least three , at least four, or at least five different agents secutive it is separated in time for only a short period ( e.g. , may be co -administered . Generally anti - cancer agents for the time it would take a medical practitioner to administer administration have complementary activities that do not two agents separately ). As used herein , concurrent admin adversely affect each other. Anti - cancer therapeutic agents istration and substantially simultaneous administration are may also be used in conjunction with other agents that serve used interchangeably . Sequential administration refers to to enhance and / or complement the effectiveness of the temporally separated administration of the agents or thera agents. Treatment efficacy can be assessed by methods pies described herein . well - known in the art, e.g. , monitoring tumor growth or [ 0467 ] Combination therapy can also embrace the admin formation in a patient subjected to the treatment. Alterna istration of the anti - cancer therapeutic agent ( e.g. , an anti US 2021/0005284 A1 Jan. 7 , 2021 53 body ) in further combination with other biologically active tothecin , Topotecan , irinotecan / SN38 , rubitecan , Belotecan , ingredients ( e.g. , a vitamin ) and non - drug therapies ( e.g. , and other derivatives ; Topoisomerase II inhibitors, such as surgery or radiotherapy ). Etoposide ( VP - 16 ) , Daunorubicin , a doxorubicin agent ( e.g. , [ 0468 ] It should be appreciated that any combination of doxorubicin , doxorubicin hydrochloride, doxorubicin ana anti - cancer therapeutic agents may be used in any sequence logs , or doxorubicin and salts or analogs thereof in lipo for treating a cancer . The combinations described herein somes ) , Mitoxantrone , Aclarubicin , Epirubicin , Idarubicin , may be selected on the basis of a number of factors, which Amrubicin , Amsacrine , Pirarubicin , Valrubicin , Zorubicin , include but are not limited to the effectiveness of altering Teniposide and other derivatives; Antimetabolites, such as identified tumor type, reducing tumor formation or tumor Folic family ( Methotrexate, Pemetrexed , Raltitrexed , Ami growth , and / or alleviating at least one symptom associated nopterin , and relatives or derivatives thereof ); Purine with the cancer, or the effectiveness for mitigating the side antagonists ( Thioguanine , Fludarabine , Cladribine , 6 -Mer effects of another agent of the combination . For example, a captopurine, Pentostatin , clofarabine, and relatives or combined therapy as provided herein may reduce any of the derivatives thereof) and Pyrimidine antagonists ( Cytarabine , side effects associated with each individual members of the Floxuridine , Azacitidine , Tegafur, Carmofur, Capacitabine, combination , for example , a side effect associated with an Gemcitabine , hydroxyurea, 5 - Fluorouracil ( 5FU ) , and rela administered anti -cancer agent. tives or derivatives thereof) ; Alkylating agents, such as [ 0469 ] In some embodiments, an anti - cancer therapeutic Nitrogen mustards ( e.g. , Cyclophosphamide, Melphalan, agent is an antibody, an immunotherapy, a radiation therapy, Chlorambucil, mechlorethamine, Ifosfamide, mechlore a surgical therapy, and / or a chemotherapy . thamine, Trofosfamide, Prednimustine, Bendamustine, Ura [ 0470 ] Examples of the antibody anti - cancer agents mustine, Estramustine, and relatives or derivatives thereof ); include , but are not limited to , alemtuzumab ( Campath ), nitrosoureas ( e.g. , Carmustine, Lomustine , Semustine , Fote trastuzumab (Herceptin ), Ibritumomab tiuxetan ( Zevalin ) , mustine, Nimustine , Ranimustine, Streptozocin , and rela Brentuximab vedotin ( Adcetris ), Ado - trastuzumab tives or derivatives thereof) ; Triazenes ( e.g. , Dacarbazine , emtansine ( Kadcyla ) , blinatumomab ( Blincyto ) , Bevaci Altretamine, Temozolomide , and relatives or derivatives zumab ( Avastin ), Cetuximab ( Erbitux ), ipilimumab ( Yer thereof ); Alkyl sulphonates ( e.g. , Busulfan , Mannosulfan , voy ) , nivolumab ( Opdivo ) , pembrolizumab ( Keytruda ), Treosulfan , and relatives or derivatives thereof ); Procarba atezolizumab ( Tecentriq ), avelumab ( Bavencio ), dur zine ; Mitobronitol, and Aziridines ( e.g. , Carboquone , Tri valumab ( Imfinzi ), and panitumumab ( Vectibix ). aziquone, ThioTEPA , triethylenemalamine, and relatives or [ 0471 ] Examples of an immunotherapy include, but are derivatives thereof ); Antibiotics, such as Hydroxyurea , not limited to , a PD - 1 inhibitor or a PD - L1 inhibitor, a Anthracyclines ( e.g. , doxorubicin agent, daunorubicin , epi CTLA - 4 inhibitor, adoptive cell transfer therapy , therapeutic rubicin and relatives or derivatives thereof ); Anthracene cancer vaccines, oncolytic virus therapy, T - cell therapy , and diones ( e.g. , Mitoxantrone and relatives or derivatives immune checkpoint inhibitors. In some embodiments , an thereof ); Streptomyces family antibiotics ( e.g. , Bleomycin , immunotherapy may include a chimeric antigen receptor Mitomycin C , Actinomycin , and Plicamycin ); and ultravio ( CAR ) T - cell therapy. A CAR is designed for a T - cell and is let light. a chimera of a signaling domain of the T - cell receptor ( TCR ) complex and an antigen -recognizing domain ( e.g., a single EXAMPLES chain fragment ( scFv ) of an antibody ) ( Enblad et al . , Human [ 0476 ] In order that the invention described herein may be Gene Therapy. 2015 ; 26 ( 8 ) : 498-505 ) . In some embodi more fully understood , the following examples are set forth . ments , an antigen binding receptor is a chimeric antigen The examples described in this application are offered to receptor ( CAR ) . A T cell that expressed a CAR is referred to illustrate the methods, compositions , and systems provided as a “ CAR T cell .” A CAR T cell receptor, in some herein and are not to be construed in any way as limiting embodiments, comprises a signaling domain of the T - cell receptor ( TcR ) complex and an antigen - recognizing domain their scope. ( e.g. , a single chain fragment ( scFv ) of an antibody ) ( Enblad Example 1 : Workflow for WES and RNA et al . , Human Gene Therapy. 2015 ; 26 (8 ) :498-505 ) . Sequencing [ 0472 ] Examples of radiation therapy include , but are not limited to , ionizing radiation , gamma - radiation , neutron [ 0477 ] Provided below is an example of specimen collec beam radiotherapy, electron beam radiotherapy, proton tion from a subject having or suspected of having cancer, therapy, brachytherapy, systemic radioactive isotopes , and DNA and / or RNA extraction therefrom , DNA library prepa radiosensitizers. ration ( cDNA in the case of library preparation from RNA ), [ 0473 ] Examples of a surgical therapy include , but are not and data processing. limited to , a curative surgery ( e.g. , tumor removal surgery ), a preventive surgery , a laparoscopic surgery , and a laser Specimen Collection surgery . [ 0478 ] Prior to collection of biological samples from a [ 0474 ] Examples of the chemotherapeutic agents include , subject having or suspected of having a cancer, sufficient but are not limited to , Carboplatin or Cisplatin , Docetaxel, quantities of sterilized instruments, consumables , and Gemcitabine , Nab - Paclitaxel , Paclitaxel , Pemetrexed , and reagents ( e.g. , digest buffer ) were verified . Vinorelbine . [ 0479 ] For tumor tissue (bulk ), 30 mg of tumor tissue was [ 0475 ] Additional examples of chemotherapy include , but collected from a subject and put into a 2 ml cryogenic tube are not limited to , Platinating agents, such as Carboplatin , with RNA - later, the contents of which were then snap Oxaliplatin , Cisplatin , Nedaplatin , Satraplatin , Lobaplatin , frozen . The specimens were shipped on dry ice as needed . Triplatin , Tetranitrate, Picoplatin , Prolindac, Aroplatin and [ 0480 ] For blood samples ( which are considered “ normal other derivatives; Topoisomerase I inhibitors, such as Camp tissue” ( or non - cancerous )) , 0.5-1 mL of whole blood was US 2021/0005284 A1 Jan. 7. 2021 54 collected in an EDTA vacutainer collection tube ( plastic DNA and RNA Extraction from Bulk Biopsy preferred ) labeled with at least the sample ID and date / time Extraction of normal DNA and RNA . collected . The vacutainer tube was then placed into a sealed [ 0495 ] DNA from biopsy specimens was extracted using biohazard bag with absorbent materials. Whole blood in DSP DNA Midi Kit ( Qiagen® ) using an automated process EDTA was frozen on dry ice as needed and sent to a on the QIAsymphony (www.qiagen.com/us/shop/auto laboratory with other specimen ( s ) as needed . FIG . 1B illus mated - solutions / sample -preparation / qiasymphony - spas- in trates an embodiment of a process that includes the sample struments / ). collection process. [ 0496 ] A minimum of 1000-2000 ng of total DNA mass in Creation of Single Cell Suspension for CYTOF and RNA at least 10 ul volumes ( e.g. , 100-200 ng / ul in 10 ul mini seq ( SCS , optional , validation ) mum ) for each DNA sample was collected . Moreover, the [ 0481 ] The following steps were used to create single - cells extracted DNA solutions had 260/280 ratios of ~ 1.8 . suspensions ( SCS ) from tumor samples that were collected [ 0497 ] A minimum of 1000-6000 ng of total RNA mass in 50 mL of cold L - 15 medium ( lx ) . was collected . RNA Integrity Number ( RIN ) scores obtained [ 0482 ] 1 ) Transfer the container with tumor sample via Agilent’s BioAnalyzer or Tape Station were of at least 7 . from an operating room on ice to a biological safety hood for dissection , wherein it takes approximately Extraction of Tumor DNA and Tumor RNA . 60-90 min from surgical resection to the bench . [ 0498 ] DNA and RNA were extracted from 30 mg of [ 0483 ] 2 ) Transfer the tumor sample into a 100x15 mm tissue using AllPrep DNA /RNA Mini Kit by Qiagen® ( using petri dish containing fresh L - 15 medium . Using a the manual process described by the manufacturer ). curved scissor, dissect the tumor into fragments of 1-2 mm on a sterile petri dish with L - 15 to keep the tissue DNA / RNA Extraction and CYTOF for SCSs moist . To 50 mL conical tube containing 25 mL of [ 0499 ] Extraction : RNeasy Micro Plus Kit by Qiagen® enzyme cocktail add 0.5 gm of tumor tissue . (manual process ) was used . A minimum of 2,000,000 cells [ 0484 ] 3 ) Place the tube on a shaker at a speed of 85 rpm were used for extraction . Table 1 below shows that the RNA for 45 min at 37 ° C. concentration , yield , and quality drops substantially if RNA is extracted from a total of less than 2 million cells . It was [ 0485 ] 4 ) After 45 min , vigorously pipette the contents found that 2 million cells provided at least 1.8 ug of RNA , using a 10 mL pipette. Incubate for another 45 min which is sufficient for good quality RNAseq data ( i.e. , less under the same conditions . noise and better correlation between RNA expression within [ 0486 ] 5 ) After the incuba filter the sample through different isoforms of the same protein coding RNA ). It is a 70 um cell strainer into a new 50 mL conical tube . recommended to have more than 1 ug of RNA for better Using the back of a 3 mL syringe, gently apply pressure quality. on the cell strainer to disaggregate any remaining tissue . TABLE 1 [ 0487 ] 6 ) Add 25 mL of warm ( 37 ° C. ) L - 15 media containing 10 % FBS through the cell strainer, into the RNA quantity and quality as a function of the number of cells from 50 mL conical tube . which RNA is extracted . [ 0488 ] 7 ) Centrifuge at 300 g at room temperature for 5 Sample Number of RNA Concentration Volume Total Yield min . Decant the supernatant. ID Cells ( ng / uL ) ( UL ) (ng ) RIN BG002 2 million 70.2 26 1825 8.5 [ 0489 ] 8 ) Add 10 mL of warm ( 37 ° C. ) 1x eBioscience BG020 1 million 8.0 57.5 460 8.3 multi species RBC lysis buffer. Incubate in the dark for BG005 0.5 million 8.2 26 213 8.7 5 min at room temperature. BG008 0.5 million 8.0 26 208 8.4 [ 0490 ] 9 ) After the incubation add 40 mL of cold 1xPBS to the tube . Centrifuge at 300 g for 5 min at 4 ° C. CyTOF : Resuspend cells that are not going to be used for Decant supernatant. Add 10 mL of cold DMEM with RNAseq (minimum 5 million ) , in cold cell staining buffer 10 % FBS , and resuspend the pellet gently. ( CSB ) and place on ice in preparation for antibody labeling. [ 0491 ] 10 ) Centrifuge at 300 g for 5 min at 4 ° C. Decant the supernatant. Resuspend cells in 1 mL of cold L - 15 Library Preparation, RNA Sequencing, and WES with 10 % FBS . [ 0500 ] Illumina libraries were made and subjected to [ 0492 ] 11 ) Filter the sample through a 70 um cell quality control ( e.g. , using Tapestation D1000 High Sensi strainer into a new 50 mL conical tube . tivity DNA screen tape ) to evaluate their integrity and peak [ 0493 ] 12 ) Count cells using Trypan blue . Also assess size . The analysis consumed up to 1 ng library in 2 uL . viability using MoxiFlow ( 4 uL cells + 1960 MoxiFlow [ 0501 ] Whole Exome Sequencing ( WES ) on DNA Viability Reagent, use 75 uL to test ) . samples ( tumor tissue and germline blood ) was performed [ 0494 ] 2,000,000 cells were aliquoted into a 15 mL using Agilent Human All Exon V6 Capture ( 48.2 Mb ) or conical tube . Once the cells have been pelleted ( more Clinical Research Exome ( 54.6 Mb ). WES Illumina deep then 2 * 10 % ) , each lysate was resuspended in 500-750 ul sequencing was performed with standard NextSeq RNA - seq of RNAlater /RNA Protect in a 1.5 ml microcentrifuge configuration , Paired - End 100 bp Reads with an estimated tube . The 1.5 ml tube was placed into a 50 mL conical coverage > 100x . tube with tissue paper /paper towels on top to secure the [ 0502 ] RNA Sequence on RNA samples ( tumor tissue and 1.5 mLtubes. The 50 ml tube can then be shipped with SCS ) was performed using Ilumina TruSeq RNA Library the tumor specimen ( s ) on dry ice . Prep PCR enrichment of captured DNA ( Poly - A mRNA US 2021/0005284 A1 Jan. 7 , 2021 55 seq ) , non - stranded ( to compared data with that of The reviewing per tile quality score . In some embodiments , Cancer Genome Atlas ( TCGA )) paired - end 100 bp Reads quality control can be assured by reviewing per sequence ( 75 + 75 ) with an estimated coverage > 50 million paired -end GC content to identify contamination . In some embodi reads . ments , quality control can be assured by reviewing per base sequencing content to identify adapter and other contami PolyA Enrichment nation . In some embodiments, quality control can be assured [ 0503 ] Different RNA enrichment methods provide vari by reviewing sequence duplication levels as a measure of a ous enrichment of RNA transcripts. riboRNA depletion quality of RNA /DNA selection and PCR . In some embodi retains 10-50 % of non - coding transcripts ( e.g., rRNA , ments, quality control can be assured by reviewing adapter miRNA , long non -coding RNA ( LncRNA )) in the library. content. So , the percentages of protein - coding reads strongly vary [ 0507 ] In some embodiments, the data analysis pipeline depending on the method of RNA enrichment. In clinical may be stopped if the quality control cannot be assured for settings the focus was on expression of protein coding further analysis with sufficient confidence. For example, in transcripts . PolyA enrichment, compared to rRNA depletion , some embodiments, if read counts represent greater than a provided more stable and controllable percent of protein threshold value ( e.g. , > 20 min ) or Phred score represent coding transcripts ( FIG . 2 ) . greater than a threshold percentage ( e.g. , > 50 % green zone ) , [ 0504 ] Further, because PolyA enrichment was used , and the analysis pipeline is terminated . it was known that protein - coding RNA was enriched , RNA ( 3 ) Determine cross - species contamination . This can be sequencing was performed on non - stranded RNA . FIG . 2B performed by using any suitable software or tool to evaluate demonstrates that differences in RNA expression levels of the cross - species contamination . In some embodiments, IL24 , ICAM4, and GAPDH RNA seen when either stranded cross - species contamination can be determined by using or non - stranded RNA is used for sequencing. Fastq Screen ( e.g. , www.bioinformatics.babraham.ac.uk/ projects / fastq_screen /_build /html / index.html) . In some FASTQ Files Processing, and RNA Expression Assessment embodiments, cross - species contamination can comprise contamination from various species such as mouse , [ 0505 ] The raw data in the NextSeq BCL file format was zebrafish , drosophila, C. elegans, Saccharomyces, Arabi converted to the standard Illumina FASTQ format. As dopsis, microbiome, adapters, vectors , and phiX . In some described herein , any type of format that is suitable for embodiments, the data analysis pipeline may be stopped if further analysis can be used . In this example, the FASTQ the cross - species contamination is too severe for further data was subjected to quality control using standard quality analysis with sufficient confidence . For example , in some control algorithms ( e.g. , FastQ Screen (www.bioinformat embodiments, if contamination represents greater than a ics.babraham.ac.uk/projects/fastqc/ ), RSEQC ( rseqc.source threshold percentage ( e.g. , > 20 % ) , the analysis pipeline is forge.net/ ), and then processed to obtain expression per gene terminated . in TPM with no or minimal batch effects across samples. ( 4 ) Assure quality of the data based on various parameters . Data in the form of FASTQ files was delivered via a secure This can be performed by using any suitable software or tool SFTP server or Illumina BaseSpace . to evaluate the quality . In some embodiments, quality con trol can be assured by using Mosdepth ( e.g. , github.com/ Quality Control Steps Assuring Quality of FASTQ Files brentp /mosdepth ). In some embodiments, quality can be [ 0506 ] The following are steps involved in assuring qual assured by determining per coverage distribu ity control of the data in FASTQ files : tion ( as a sex prediction algorithm ). In some embodiments, ( 1 ) Remove low - quality reads. This can be performed by quality can be assured by determining o specific regions using any suitable software or tool to evaluate and / or coverage distribution ( e.g. , Collaborative Consensus Coding remove reads that are deemed of low - quality such as based Sequence ( CCDS ) , exons , etc. ) . In some embodiments, the on positional information . In some embodiments, low - qual data analysis pipeline may be stopped if the quality of the ity reads can be removed by using FILTERBYTILE ( e.g. , data is too low for further analysis with sufficient confi www.filterbytile.sh ( from BBmap ) ). In some embodiments, dence . For example, in some embodiments, if the confirma low - quality reads ( e.g. , bad tiles ) are removed from tion of coverage of clinically important genome regions is sequence files ( e.g. , FASTQ files ). In some embodiments, failed , the analysis pipeline is terminated . the data analysis pipeline may be stopped if the quality of the ( 5 ) Assure the presence and the quality of certain charac reads is too low for further analysis with sufficient confi teristics of the data . This can be performed by using any dence . For example, in some embodiments , if bad tiles suitable software or tool . In some embodiments, the pres represent greater than a threshold percentage ( e.g. , 50 % ) of ence or quality of certain characteristics of the data can be the sample , the analysis pipeline is terminated . assured by using Picard ( broadinstitute.github.io/picard/) . In ( 2 ) Assure quality control based on various parameters. This some embodiments, certain characteristics can be the num can be performed by using any suitable software or tool to ber of the percentage of the duplicities. In some embodi evaluate the confidence of the quality control. In some ments , certain characteristics can be mapped regions . In embodiments , quality control can be assured by using some embodiments , certain characteristics can be properly FastQC ( e.g. , www.bioinformatics.babraham.ac.uk/projects/ paired regions. fastqc /) . In some embodiments , quality control can be ( 6 ) Assure quality control based on various parameters. This assured by reviewing read counts as a measure of the can be performed by using any suitable software or tool to complexity of the library. In some embodiments, quality evaluate the confidence of the quality control. In some control can be assured by reviewing per base Phred quality embodiments, quality control can be assured by using score as a measure of sequencing quality of the platform . In RseQC ( e.g. , rseqc.sourceforge.net/ ). In some embodiments, some embodiments , quality control can be assured by quality control can be assured by reviewing strandedness US 2021/0005284 A1 Jan. 7 , 2021 56 analysis to prove stranded or non - stranded RNA - seq proto ments, if the HLA allele from a sample does not confirm the col . In some embodiments, quality control can be assured by source of the samples, the analysis pipeline is terminated . reviewing gene body coverage to detect coverage bias due to ( 11 ) Perform distribution analysis of expression for different the extraction protocol ( polyA / total RNA - seq ) and RIN . In transcripts types. This can be performed by using any some embodiments, quality control can be assured by suitable software or tools . In some embodiments , the tran reviewing read distribution of exons , introns , transcription scripts type can be Mt rRNA , Mt tRNA , lincRNA , miRNA , end sites ( TES ) , and transcription start sites ( TSS ) . In some misc RNA , protein coding , rRNA , SARNA , SnoRNA , embodiments , the data analysis pipeline may be stopped if ribozyme, Ig , processed , NMD , or retained . In some the quality control cannot be assured for further analysis embodiments, one or more transcripts type can be deter with sufficient confidence . For example, in some embodi mined . In some embodiments, the data analysis pipeline may ments, if duplicates represent greater than a threshold per be stopped if the transcript type is not suitable for further centage ( e.g. , < 60 % ) for RNA or less than a threshold analysis with sufficient confidence . For example , in some percentage ( e.g. , < 20 % ) for adapter contamination, the embodiments, if the transcripts represent a greater threshold analysis pipeline is terminated . percentage ( e.g. , > 70 % transcripts are protein - coding tran ( 7 ) Check cross - individual contamination by determining scripts ) , the analysis pipeline is terminated . concordance of a pair of samples ( e.g. , tumor / normal from the same patient ). This can be performed by using any Alignment suitable software or tool. In some embodiments , cross [ 0508 ] Alignment can be performed by using any suitable individual contamination can be determined by using Con software or tools . For example, a program for quantifying pair ( e.g. , github.com/nygenome/Conpair ). In some embodi transcripts, for example from bulk and single - cell RNA -Seq ments, the data analysis pipeline may be stopped if the data, using high - throughput sequencing reads ( e.g. , Kalliso cross - individual contamination is too severe for further available from Github, www.github.com , for example as analysis with sufficient confidence . For example , in some described in Nicolas L Bray, Harold Pimentel, Pall Melsted embodiments, if normal DNA does not match tumor DNA, and Lior Pachter, Near - optimal probabilistic RNA - seq quan the analysis pipeline is terminated . In some embodiments , if tification , Nature Biotechnology 34 , 525-527 ( 2016 ) , doi : large cross - individual contamination is detected , the analy 10.1038 /nbt.3519 ) was performed with input FASTQ files . sis pipeline is terminated . Kalisto indexing was performed based on : ( 8 ) Run a tumor type classifier . This can be performed by [ 0509 ] a . GRCh38 genome assembly ( no alt analysis ) with using any suitable software or tools . In some embodiments, a gene expression - based classifier can be used . For example, overlapping genes from the PAR locus removed . a gene expression - based classifier trained on RNAseq of [ 0510 ] ene annotation based on GENCODE V23 previously sequenced tumors of different tissue types can be comprehensive annotation ( regions ALL ) (www.gencode used to classify tumor type. Examples of such classifiers are genes.org ) described herein and in U.S. Provisional Patent Application [ 0511 ] Files with transcript expression in TPM (Tran Ser. No. 62 / 943,976 , titled “ Machine Learning Techniques scripts Per Kilobase Million ) were obtained thereafter. for Gene Expression Analysis , " filed on Dec. 5 , 2019 , which is incorporated by reference herein in its entirety . In some Removing of Non - Coding Transcripts for the Data and embodiments, this allows the prediction of the tumor type Other Biases from RNA - seq data on the basis of the gene expression data . [ 0512 ] The Transcripts Per Million ( TPM ) expression In some embodiments, the data analysis pipeline may be allows presentation of gene expression in the format of stopped if the tumor type is a mismatch for further analysis concentration ( in 1 million of transcripts ). This allows with sufficient confidence . For example, in some embodi comparison of samples with different coverage and RNA ments, if the asserted tumor type from clinicians does not sequencing depth . match the determined tumor type, the analysis pipeline is [ 0513 ] TPM uses correction of the read counts by the terminated . length of each gene in bases , so it can create a great bias in ( 9 ) Predict library type. This can be performed by using any the samples with uneven distribution of non - coding tran suitable software or tools . In some embodiments , RNA - seq scripts after TPM calculation because some non - coding type classifier can be used . In some embodiments, the transcripts (miRNA , snRNA , snoRNA ) have very small RNA - seq type classifier can be a gene expression - based transcript length . FIG . 3 shows the biases that are created classifier on XGboost ( e.g. , xgboost.readthedocs.io/en/lat upon TPM calculation . est / ) trained model. In some embodiments, the prediction of [ 0514 ] To remove batches based on uneven distribution of library type is based on expression of specific genes from the non - coding transcripts in RNA library, non - coding tran RNA - seq data . In some embodiments, the data analysis scripts were removed from the data before further RNA pipeline may be stopped if the library type is a mismatch for expression quantification . further analysis with sufficient confidence . For example , in Excluded types included : some embodiments , if the asserted library type does not { pseudogene, polymorphic_pseudogene, processed match the determined library type ( e.g. , total RNA - seq , or pseudogene, transcribed_processed_pseudogene , unitary_p polyA - RNA - seq ), the analysis pipeline is terminated . seudogene , unprocessed_pseudogene, transcribed_unitary ( 10 ) Check concordance of HLA allele . This can be per pseudogene, IG_C_pseudogene, IG_J_pseudogene, IG_V_ formed by using any suitable software or tools . In some pseudogene, transcribed_unprocessed_pseudogene , embodiments, MHC allele composition can be determined . translated_unprocessed_pseudogene TR_J_pseudogene, In some embodiments , the data analysis pipeline may be TR_V_pseudogene stopped if the HLA allele is a mismatch for further analysis SARNA , snoRNA , miRNA with sufficient confidence . For example , in some embodi Ribozyme, rRNA , Mt_tRNA , Mt_rRNA , scaRNA US 2021/0005284 A1 Jan. 7 , 2021 57 retained_intron , sense_intronic , sense_overlapping conducted by lab personnel. In the event that quality control nonsense_mediated_decay , non_stop_decay related issues arise , lab personnel will notify the provider Antisense , lincRNA , macro_lncRNA ( e.g. , healthcare provider) of the cells or tissue ( e.g. , PMBCs processed_transcript, 3prime_overlapping_nerna or cell suspensions) . SRNA , misc_RNA , vaultRNA , TEC } Retained types: Preparation of Reagents { protein_coding, [ 0520 ] Reagents for the extraction of DNA and / or RNA from a sample were prepared according to the manufactur Ig ( IG_C_gene, IG_D_gene, IG_J_gene, IG_V_gene) er's instruction, including AllPrep DNA / RNA Mini hand book and AllPrep DNA / RNA Micro handbook , the contents TCR ( TR_C_gene, TR_D_gene, TR_J_gene, TR_V_gene ) } of which are incorporated by reference herein . Some of the [ 0515 ] In addition to removing non - coding transcripts, processes may be customized based on the requirements of genes that were found to have the highest variance between the nucleic acids of a given sequencing platform . In general, PolyARNA sequencing and total RNA sequencing were also B -mercaptoethanol ( B - ME ) was added to Buffer RLT Plus removed . Such genes included ( 1 ) histone - encoding genes , before use . 10 uL B - ME per 1 mL Buffer RLT Plus was and ( 2 ) mitochondria - related genes , having very long or added . The lab personnel who conducted the preparation of very short PolyA tails , which result in uneven enrichment of the reagents wore appropriate Personal Protective Equip the transcripts. ment ( PPE ) and the reagents were dispensed in a fume hood . [ 0516 ] FIG . 4A shows the variation in the length of PolyA Buffer RLT Plus was generally stable at room temperature tails for different histone - encoding genes. FIG . 4B shows a for 1 month after addition of B - ME . The date of the addition comparison of expression of histone coding and mitochon of B - ME and the 1 -month expiration date were marked on drial genes within samples in which RNA was enriched by the bottle . either polyA enrichment or by ribo -RNA depletion ( total [ 0521 ] Buffer RPE , Buffer AW1, and Buffer AW2 were RNA ). Genes that are excluded are described in the present each supplied as a concentrate by the manufacturer. Before disclosure ( e.g. , transcripts from protein non - coding regions, using for the first time , appropriate volume of 100 % ethanol histone - encoding genes , and mitochondria - related genes ). was added, as indicated on the bottle , to obtain a working solution . The solutions were appropriately labeled per the Gene Aggregation and TPM Normalization Solutions and Reagent Labeling Standard Operating Proce [ 0517 ] Expression per gene was calculated as a sum of the dure ( SOP ) as described herein . Buffer RLT Plus may form expression of the transcripts for the gene . Gene expression a precipitate during storage . If necessary , the precipitates data was normalized to the total number of transcripts in the formed in Buffer RLT Plus were dissolved by warming in a million ). This procedure allows correction for major batch 37 ° C. water bath until precipitates were dissolved . The effects associated with library preparation , uneven RNA Buffer RLT Plus without precipitates was then place at room transcript distribution between samples, and correction for temperature . Prolonged incubation in the water bath was not recommended . It was noted that Buffer RLT Plus , Buffer RNA enrichment method ( FIG . 5 ) . RW1, and Buffer AW1 contained a guanidine salt . Example 2 : DNA and RNA Extraction from Peripheral Blood Mononuclear Cells ( PBMC ) or Preparation of Material for Extraction Cell Suspensions [ 0522 ] Before the start of extraction , tubes and columns [ 0518 ] To prepare nucleic acid materials for downstream were labeled with a specimen ID for each sample being sequencing analysis, DNA and / or RNA was extracted from processed . Frozen cell pellets were thawed slightly, so they a single PBMC cell pellet or suitable cell suspensions . In were dislodged by flicking the tube. Cell lysates were brief, AllPrep DNA /RNA assay kits ( Qiagen® ) were used incubated at 37 ° C. in a water bath until completely thawed . for purifying genomic DNA and total RNA simultaneously Prolonged incubation was discouraged, due to its potential from a single biological sample . Biological samples were compromise on RNA integrity. For pelleted cells , the cell first lysed and homogenized in a highly denaturing guani pellet was loosened thoroughly by flicking the tube . This dine- isothiocyanate - containing buffer, which immediately was an important step for properly preparing the nucleic acid inactivated DNases and RNases to ensure isolation of intact materials because incomplete loosening of the cell pellet DNA and RNA . The lysate was then passed through an may lead inefficient lysis and reduced nucleic acid yields . AllPrep DNA spin column. This column, in combination Appropriate volume of Buffer RLT Plus was added, fol with the high - salt buffer, allowed selective and efficient lowed by vortexing or pipetting to mix . In general, for binding of genomic DNA . The column was washed and the < 5x105 cells , 350 uL Buffer RLT Plus was added . For DNA was then eluted . Alternatively, the lysate that passed 5x105-1x107 cells , 600 uL Buffer RLT Plus was added . through the AllPrep DNA spin column went through an [ 0523 ] The lysate was homogenized by using QIAshred RNeasy spin column to selectively isolate RNA . der. In brief, the lysate was pipetted directly into a [ 0519 ] In some circumstances, for further improving the QIAshredder spin column placed in a 2 mL collection tube . quality of the starting RNA , ethanol was added to the The lysate was then pipetted and centrifuged for 2 minutes flow - through from the AllPrep DNA spin column to provide at maximum speed ( 18,565xg ) . The homogenized lysate was appropriate binding conditions for RNA . The sample was transferred to an AllPrep DNA spin column placed in a 2 mL then applied to an RNeasy spin column, where total RNA collection tube . The lid was closed gently, and the spin was bound to the membrane and contaminants were washed column was centrifuged for 30s at 28000x g . After centrifu away . High - quality RNA was then eluted in water . Some of gation, any remaining liquid on the column membrane was the steps and / or the entire procedure can be managed and checked and removed . If necessary , the centrifugation step US 2021/0005284 A1 Jan. 7 , 2021 58 was repeated until all liquid was passed through the mem AW2 was added to the AllPrep DNA spin column. The lid brane . The AllPrep DNA spin column was placed in a new was closed gently , and was centrifuged for 2 min at full 2 ml collection tube and was stored at room temperature or speed ( 18,565xg ) to wash the spin column membrane . After at 4 ° C. ( not in the freezer ) for later DNA purification. The centrifugation, the AllPrep DNA spin column was carefully flow -through for RNA purification was used . removed from the collection tube. If the column contacted the flow - through , the collection tube was emptied and the Total RNA Purification spin column was centrifuged again for 1 min at full speed . [ 0524 ] For purifying RNA , 600 uL of 70 % ethanol was [ 0529 ] If < 5x10 cells were processed , the AllPrep DNA added to the flow through from the previous step , and was spin column was placed in a new 1.5 mL collection tube . 50 mixed well by pipetting. Up to 700 uL of the sample was uL Buffer EB was added ( preheated to 70 ° C. ) directly to the immediately transferred , including any precipitate that was spin column membrane and the lid was closed and was formed which might be visible , to an RNeasy spin column incubated at room temperature for 2 min . The spin column placed in a 2 ml collection tube . The lid of the collection was centrifuged for 1 min at 28000x g to elute the DNA . tube was closed gently and was centrifuged for 15 s at Repeat addition of Buffer EB was conducted and centrifu 28000x g . The flow - through was discarded . In the event that gation to elute further DNA . A new 1.5 mL collection tube the sample volume exceeded 700 uL , successive aliquots was used to collect the second DNA eluate , and then was were centrifuged in the same RNeasy spin column . The combined with the first eluate. The spin column was dis flow - through was discarded after each centrifugation . The carded , and was stored in the 1.5 mL tube with extracted collection tube was re -used in the following step . DNA at 4 ° C. until further processing. [ 0525 ) 700 uL Buffer RW1 was added the RNeasy spin [ 0530 ] If > 5x10 % cells were processed, the AllPrep DNA column. The lid was closed gently, and was centrifuged for spin column was placed in a new 1.5 mL collection tube . 50 15 s at 28000x g to wash the spin column membrane . The ?L Buffer EB was added directly to the spin column mem flow -through was discarded . The collection tube was re - used brane and the lid was closed . The spin column was incubated in the following step . 500 uL Buffer RPE was added to the at room temperature for 1 min was centrifuged for 1 min at RNeasy spin column . The lid was closed gently, and was 28000x g to elute the DNA . Repeat addition of Buffer EB centrifuged for 15 s at 28000x g to wash the spin column was conducted and centrifugation to elute further DNA . A membrane . The flow - through was discarded . new 1.5 mL collection tube was used to collect the second [ 0526 ] In general, if < 5x109 cells were processed , 500 uL DNA eluate , and then was combined with the first eluate . of 80 % ethanol was added to the RNeasy MinElute spin The spin column was discarded , and was stored in the 1.5 column. The lid was closed gently, and the spin column was mL tube with extracted DNA at 4 ° C. until further process centrifuged for 2 min at 28000x g to wash the spin column ing . membrane. The collection tube with the flow - through was [ 0531 ] Troubleshooting processes included , but were not discarded . The RNeasy MinElute spin column was placed in limited to the following in Table 3 . a new 2 mL collection tube . The lid of the spin column was opened, and the spin column was centrifuged at full speed TABLE 3 ( 18,565xg ) for 5 minutes . The collection tube with the flow - through was discarded . The RNeasy MinElute spin Troubleshooting Steps for AllPrep DNA / RNA Procedure column was placed in a new 1.5 mL collection tube. 14 uL Incident Troubleshooting steps RNase - free water was directly added to the center of the spin Clogged Clogged column can be caused by the following : column membrane . The lid was closed gently , and was AllPrep DNA a ) Inefficient disruption and / or homogenization centrifuged for 1 min at full speed ( 18,565xg ) to elute the and RNeasy 1. Increase g - force and centrifugation time if necessary RNA . The spin column was discarded , and the 1.5 mL tube spin column 2. In subsequent preparations, reduce the amount of was stored with extracted RNA at -80 ° C. until further starting material and / or increase the homogenization time . processing. b ) Too much starting material [ 0527 ] If > 5x10 cells were processed , 500 uL Buffer RPE 1. Reduce the amount of starting material. It is essential to use the correct amount. was added to the RNeasy spin column . The lid was closed c ) Centrifugation temperature too low gently, and was centrifuged for 2 min at 28000x g to wash 1. The centrifugation temperature should be 20-25 ° C. the spin column membrane . The RNeasy spin column was Some centrifuges may cool placed in a new 2 mL collection tube . The old collection tube Low nucleic Low nucleic acid yield can be caused by the following : acid yield a ) Inefficient disruption and / or homogenization with the flow - through was discarded . The collection tube 1. In subsequent preparations , reduce the amount of was then centrifuged at full speed ( 18,565xg ) for 1 min . The starting material and / or increase the homogenization RNeasy spin column was placed in a new 1.5 mL collection time. tube. 30-50 uL of RNase - free water was added directly to the b ) Too much starting material spin column membrane. The lid was closed gently, and was 1. Reduce the amount of starting material. It is essential to use the correct amount. centrifuge for 1 min at 28000x g to elute the RNA . c ) RNA still bound RNeasy spin column membrane 1. Repeat RNA elution , but incubate the RNeasy spin Genomic DNA Purification column on the benchtop for 10 min with RNase - free water before centrifuging [ 0528 ] 500 uL Buffer AW1 was added to the AllPrep DNA d ) DNA still bound to AllPrep DNA spin column membrane spin column ( previously placed in a new 2 ml collection tube 1. Repeat DNA elution , but incubate the AllPrep DNA and stored at room temperature or at 4 ° C. ) . The lid was spin column on the benchtop for 10 minutes with Buffer closed gently, and the spin column was centrifuged for 15 s EB before centrifuging. at 28000x g . The flow - through was discarded . The spin e ) Ethanol carryover column was re - used in the following step . 500 uL Buffer US 2021/0005284 A1 Jan. 7 , 2021 59

Example 3 : The Constructions of DNA Libraries TABLE 3 - continued for Sequencing Troubleshooting Steps for AllPrep DNA /RNA Procedure [ 0532 ] DNA libraries were prepared before performing the Incident Troubleshooting steps downstream sequencing. In brief, library Construction ( LC ) 1. During the second wash with Buffer RPE , be sure to consisted of shearing extracted genomic DNA to a pre centrifuge at 8000 ~ g for 2 min at 20-25 ° C. to dry the determined size ( e.g. , 200 base pairs ) , and then prepared the RNeasy spin column membrane . libraries for Hybrid Capture . Fragmented DNA was 2. Perform the optional centrifugation to dry the RNeasy spin column membrane if any flow -through is present on repaired , and unique molecular barcodes were added to each the outside of the column . DNA sample , so that each DNA sample could be identified DNA This can be caused by the following: during sequencing. DNA samples were purified before contaminated a ) Lysate applied to the AllPrep DNA spin column amplifying the barcoded libraries with Polymerase Chain with RNA contains ethanol 1. Add ethanol to the lysate after passing the lysate Reaction ( PCR ) . The DNA samples were then purified again through the AllPrep DNA spin column . before the amount and quality of each library was assessed b ) Sample is affecting pH of homogenate using Quality Control ( QC ) steps described according to 1. The final homogenate should have a pH of 7. Make manufacturer instructions . sure that the sample is not highly acidic or basic . Contamination This can be caused by the following: [ 0533 ] In general, library Construction consisted of four of RNA with a ) Cell number too high main steps . First , genomic DNA was sheared to about 200 DNA affects 1. For some cell types , the efficiency of DNA binding to base pairs using the SureSelect XT HS Enzymatic Fragmen downstream the AllPrep DNA spin column may be reduced when applications processing very high cell numbers . tation Kit . The shearing resulted in DNA fragments that b ) Tissue has high DNA content needed to undergo blunt end repair. The second step was the 1. For certain tissues with extremely high DNA content repairing and dA - tailing of the DNA ends . This step added ( e.g. , thymus ), some DNA will pass through the AllPrep an “ A ” base to the 3 ' end of a blunt phosphorylated DNA DNA spin column . Try using smaller samples . fragment. This treatment created compatible overhangs for Alternatively , perform DNase digestion on the RNeasy the next step of DNA sample preparation . In the third step , spin column membrane , or perform DNase digestion of the eluted RNA followed by RNA cleanup . specific molecular -barcoded adaptors were ligated to each Low A260 / A280 Use 10 mM Tris HCl , pH 7.5 , not RNase - free water , to sample using the “ A ” base overhang created in the last step . value in RNA dilute the sample before measuring purity. Adapters were platform - specific sequences for fragment eluate RNA degradation can be caused by the following : recognition by the sequencer : for example , the P5 and P7 RNA degraded a ) Inappropriate handling of starting material sequences enabled library fragments to bind to the flow cells 1. Ensure that tissue samples are properly stabilized and stored in RNAlater RNA Stabilization Reagent. of Illumina platforms. The molecular barcode was unique to 2. Ensure that frozen tissue was flash - frozen immediately each sample being run and allowed multiple samples to be in liquid nitrogen and properly stored at -70 ° C. Perform subsequently mixed together, with the barcode used to the AllPrep DNA /RNA procedure quickly , especially the identify each sample at sequencing . The samples were then first few steps. purified using AMPure XP beads . In the final step , the b ) RNase contamination adaptor - ligated libraries were amplified with PCR , and then 1. Although all AllPrep buffers have been tested and are guaranteed RNase - free, RNases can be introduced during purified a second time using AMPure XP beads . Some or all use . Be certain not to introduce any RNases during the of the procedures were managed and conducted by lab AllPrep DNA /RNA procedure or later handling. personnel. In the event that quality control related issues DNA This can happen when homogenization is too vigorous. arise , lab personnel will notify the provider ( e.g. , healthcare fragmented The length of purified DNA depends strongly on the provider ) of the biopsy sample or the extracted DNA . homogenization conditions . If longer DNA fragments are required , keep the homogenization time to a minimum or use a gentler homogenization method if possible . Normalization of Samples for Library Construction Nucleic acid The elution volume may be too high . Elute nucleic acids [ 0534 ] Samples were normalized to 10-200 ng in 7 UL , concentration in a smaller volume . Do not use less than 50 uL Buffer using low TE . The maximum amount of DNA available was too low EB for the AllPrep DNA spin column , or less than 1 x 30 uL of water for the RNeasy spin column . Although used for each sample, within the range provided . The lab eluting in smaller volumes results in increased nucleic personnel then navigated to the normalization spreadsheet, acid concentrations, yields may be reduced . which was located in the Clinical Lab Documents folder in Nucleic acids a ) Salt carryover during elution the shared Google Drive. The tab labeled “ LC Normaliza do not perform 1. Ensure that buffers are at 20-30 ° C. tion ” was selected . The Sample ID was entered into column well in 2. Ensure that the correct buffer is used for each step of A. The measured concentration was entered into column B. downstream the procedure . experiments 3. When reusing collection tubes between washing steps , The spreadsheet automatically calculated the volumes of remove residual flow - through from the rim by blotting on sample and low TE required for normalization in columns G clean paper towels . and H. If the concentration of a sample was on the lower b ) Ethanol carryover side , the spreadsheet calculated a volume of sample > 7 uL 1. During the second wash with Buffer RPE , be sure to centrifuge at 28000 x g for 2 min at 20-25 ° C. to dry the and a volume of low TE < 0 uL . If this occurred , only 7 UL RNeasy spin column membranes . After centrifugation , of sample was used and was not diluted . The volumes carefully remove the column from the collection tube so calculated in the spreadsheet were used for normalizing the that the column does not contact the flow - through . appropriate volumes into a 96 well semi - skirted PCR plate . Otherwise, carryover of ethanol will occur. 2. Perform the optional centrifugation to dry the RNeasy Enzymatic DNA Shearing spin column membrane if any flow - through is present on the outside of the column. [ 0535 ] In some embodiments , DNA is fragmented using an endonuclease ( e.g. , using an enzymatic fragmentation kit from SureSelect ). In some embodiments , a SureSelect Frag US 2021/0005284 A1 Jan. 7 , 2021 60 mentation Buffer and Enzyme were thawed on ice . Frag Repair A - Tailing Enzyme Mix , a 180 uL reaction volume for mentation Buffer was vortexed and spun down before use . A 8 reactions ( including excess ) containing 144 uL of End 3 ul Fragmentation master mix for each sample was pre Repair A - Tailing Buffer and 36 uL of End Repair A - Tailing pared using 2 uL of 5x SureSelect Fragmentation Buffer Enzyme Mix , a 500 uL reaction volume for 24 reactions mixed with 1 uL SureSelect Fragmentation Enzyme. In ( including excess ) containing 400 uL of End Repair A - Tail some embodiments, larger volumes can be prepared for ing Buffer and 100 uL of End Repair A - Tailing Enzyme Mix . multiple reactions ( e.g. , 18 uL of 5x SureSelect Fragmen [ 0542 ] The End Repair - A Tailing Buffer was slowly tation Buffer mixed with 9 ° L SureSelect Fragmentation pipetted into a 1.5 mL Eppendorf tube , ensuring that the full Enzyme for 8 reactions including excess ). volume was dispensed. The End Repair - A Tailing Enzyme [ 0536 ] 3 ul Fragmentation master mix was added to each Mix was slowly added , rinsing the enzyme tip with buffer sample well and was mixed by pipetting up and down 20 solution after addition and was mixed well by pipetting up times . The plate was immediately placed on the thermal and down 15-20 times or sealed the tube and vortexed at cycler on the Enzymatic Fragmentation program ( step 1 : 37 ° high speed for 5-10 seconds. The liquid was spun briefly to C. for 15 minutes ; step 2 : 65 ° C. for 5 minutes and step 3 : collect and was kept on ice . 20 uL of the End Repair /dA 4 ° C. on hold ). Tailing master mix was added to each sample well contain Repair and dA - Tail the Fragmented DNA Ends ing approximately 50 uL of fragmented DNA and was mixed [ 0537] In some embodiments, fragmented DNA is by pipetting up and down 15-20 times using a pipette set to repaired and dA - tailed , for example using a kit from Sure 60 uL or capped the wells and vortexed at high speed for Select . In some embodiments , reagents were first thawed on 5-10 seconds . The samples were briefly spun and then the ice ( e.g. , from -20 ° C. storage ) and Agencourt AMPure XP plate or strip tube was immediately placed in the Thermal beads were equilibrated to room temperature for at least 30 cycler and started the End Repair /dA - Tailing program ( step minutes. End Repair A - Tailing Buffer, Ligation Buffer, End 1 : 20 ° C. for 15 minutes; step 2 : 72 ° C. for 15 minutes and Repair A - Tailing Enzyme Mix , T4 DNA Ligase , and Adap step 3 : 4 ° C. On hold ). tor Oligo Mix ( all from SureSelect XT HS Library Prepa ration Kit for ILM ) were mixed by vortexing. Ligate the Molecular - Barcoded Adaptor [ 0538 ] In some embodiments, a ligation master mix was prepared . The thawed vial of Ligation Buffer was vortexed [ 0543 ] Once the thermal cycler reached the 4 ° C. Hold for 15 seconds at high speed to ensure homogeneity . The step , the samples were transferred to ice while setting up this Ligation Buffer used in this step was viscous and was mixed step . To each end -repaired /dA - tailed DNA sample ( approxi thoroughly by vortexing at high speed for 15 seconds before mately 70 uL ) , 25 uL of the Ligation master mix that was removing an aliquot for use . When combined with other prepared previously was added , kept at room temperature , reagents, the Ligation Buffer was mixed well by pipetting up and was mixed by pipetting up and down at least 10 times and down 15-20 times using a pipette set to at least 80 % of using a pipette set to 85 uL or capped the wells and vortexed the mixture volume or by vortexing at high speed for 10-20 at high speed for 5-10 seconds . The samples were briefly seconds . A flat- top vortex mixer was used when vortexing spun . Silt of Adaptor Oligo Mix (white capped tube) was strip tubes or plates throughout the protocol . When reagents added to each sample and was mixed by pipetting up and were mixed by vortexing, the occurrence of adequate mixing down 15-20 times using a pipette set to 85 uL or capped the was visually verified . wells and vortexed at high speed for 5-10 seconds . The [ 0539 ] In some embodiments, an appropriate volume of Ligation master mix and the Adaptor Oligo Mix were added Ligation master mix was prepared by combining reagents as to the samples in separate addition steps as directed in the follows: a 25 uL reaction volume for 1 reaction containing steps above , mixing after each addition . The samples were 23 uL of Ligation Buffer and 2 uL of T4 DNA Ligase , a 225 briefly spun and the plate or strip tube was then immediately uL reaction volume for 8 reactions ( including excess ) con placed in the thermal cycler and bean the Ligation program taining 207 uL of Ligation Buffer and 18 uL of T4 DNA ( step 1 : 20 ° C. for 30 minutes; step 2 : 4 ° C. on hold ) . The Ligase , a 625 uL reaction volume for 24 reactions ( including sample wells were sealed and were stored overnight at either excess ) containing 575 uL of Ligation Buffer and 50 uL of 4 ° C. or -20 ° C. if next steps were not continued . T4 DNA Ligase. [ 0540 ] The Ligation Buffer was slowly pipetted into a 1.5 Purify the Samples Using AMPure XP Beads mL Eppendorf tube , ensuring that the full volume was [ 0544 ] The AMPure XP beads were verified and held at dispensed . The T4 DNA Ligase was slowly added , rinsing room temperature for at least 30 minutes before use . The the enzyme tip with buffer solution after addition and was beads were not frozen at any time . 400 uL of 70 % ethanol mix well by slowly pipetting up and down 15-20 times or per sample was prepared , plus excess, for use in the follow sealed the tube and vortexed at high speed for 10-20 ing steps. The freshly -prepared 70 % ethanol may be used for seconds. The liquid was spun briefly to collect the liquid , subsequent purification steps run on the same day. The which was kept at room temperature for a minimum of 30 complete Library Preparation protocol required 0.8 ml of minutes , but not more than 45 minutes before use . fresh 70 % ethanol per sample . The AMPure XP bead sus [ 0541 ] The thawed vial of End Repair - A Tailing Buffer pension was mixed well so that the reagent appeared homo was thawed for 15 seconds at high speed to ensure homo geneous and consistent in color. 80 uL of homogeneous geneity. The solution was visually inspected . If any solids AMPure XP beads were added to each DNA sample (ap were observed , vortexing was continued until all solids were proximately 100 uL ) in the PCR plate or strip tube and were dissolved . The appropriate volume of End Repair /dA - Tail pipetted up and down 15-20 times or capped the wells and ing master mix was prepared by combining the following vortexed at high speed for 5-10 seconds to mix . Samples reagents : a 20 uL reaction volume for 1 reaction containing were incubated for 5 minutes at room temperature . The plate 16 uL of End Repair A - Tailing Buffer and 4 uL of End or strip tube was put into a magnetic separation device US 2021/0005284 A1 Jan. 7 , 2021 61

( DynaMag -96 Side Magnet) and was waited for the solu of 5x Herculase II Reaction Buffer, 12.5 uL of 100 mM tion to clear (approximately 5 to 10 minutes ). The plate or dNTP Mix , 50 uL of Forward Primer, and 25 uL of 5x strip tube was placed in the magnetic stand . The cleared Herculase II Fusion DNA . solution from each well was carefully removed and dis [ 0547 ] 13.5 uL of the PCR reaction mixture was added to carded . The beads were not touched while removing the each purified DNA library sample ( 34.5 uL ) in the PCR plate solution . The plate or strip tube was continued to keep in the wells . 2 uL of the appropriate SureSelect XT HS Index magnetic stand while 200 uL of freshly - prepared 70 % Primer was added to each reaction . The wells were capped ethanol in each sample well was dispensed . Any disturbed and were then vortex at high speed for 5 seconds . The plate beads were allowed to settle after 1 minute and the ethanol or strip tube was spun briefly to collect the liquid and any was removed . The plate or strip tube was placed in the bubbles were released . Before adding the samples to the magnetic stand while you dispense another 200 uL of thermal cycler, the Pre -Capture PCR program was started freshly - prepared 70 % ethanol in each sample well . Any according to the conditions below to bring the temperature disturbed beads were allowed to settle after 1 minute and the of the thermal block to 98 ° C. Once the thermal cycler ethanol was removed . The wells were sealed with strip caps , reached 98 ° C. , the sample plate or strip tube was immedi and the samples were then briefly spun to collect the residual ately placed in the thermal block and the following tem ethanol . The plate or strip tube was returned to the magnetic perature cycling protocol was performed . stand for 30 seconds . The residual ethanol was removed with a P20 pipette . The samples were air dried for 5 minutes . The bead pellet was not dried to the point that the pellet appeared Segment Number of Cycles Temperature Time cracked during any of the bead drying steps in the protocol. 1 1 98 ° C. 2 minutes Elution efficiency was significantly decreased when the bead 2 8 98 ° C. 30 seconds pellet was excessively dried . 35 uL nuclease - free water was 60 ° C. 30 seconds added to each sample well . The wells were sealed with strip 72 ° C. 1 minute caps , then were mixed well on a vortex mixer and the plate 3 1 72 ° C. 5 minutes or strip tube was briefly spun to collect the liquid and was 4 1 4 ° C. Hold incubated for 2 minutes at room temperature . The plate or strip tube was put in the magnetic stand and was left for Purify the Amplified Library with AMPure XP Beads approximately 5 minutes , until the solution was clear . The [ 0548 ] The AMPure XP beads were verified to be held at cleared supernatant ( approximately 34.5 uL ) was removed room temperature for at least 30 minutes before use . 400 L to a fresh PCR plate or strip tube sample well and was kept of 70 % ethanol per sample was prepared, plus excess . The on ice . The beads could be discarded at this time . It was AMPure XP bead suspension was mixed well , so that the noted that it may not be possible to recover the entire 34.5 reagent appeared homogeneous and consistent in color. 50 uL supernatant volume at this step . The maximum possible uL of homogenous AMPure XP beads were added to each amount of supernatant was transferred for further process amplification reaction in the PCR plate or strip tube and was ing . To maximize recovery , the cleared supernatant was pipetted up and down 15-20 times to mix . Samples were transferred to a fresh well in two rounds of pipetting, using incubated for 5 minutes at room temperature . The plate was a P20 pipette set at 17.25 out into a magnetic separation device (DynaMag -96 Side Amplify the Adaptor - Ligated Library Magnet ) and was waited up to 5 minutes for the solution to clear. The plate or strip tube was put on the magnetic stand [ 0545 ] The following PCR reagents from the SureSelect and the cleared solution from each well was carefully XT HS Library Preparation Kit for ILM ( PrePCR ) were removed and discarded . The beads were touched while thawed , mixed and kept on ice . Herculase II Fusion DNA removing the solution . The plate or strip tube was continued Polymerase was mixed by pipetting up and down 15-20 to be kept in the magnetic stand while dispensing 200 uL of times . 5x Herculase II Reaction Buffer was mixed by freshly - prepared 70 % ethanol into each sample well . Dis vortexing. 100Mm dNTP Mix was mixed by vortexing. turbed beads were allowed to settle after the wait for 1 Forward Primer and SureSelect XT HS Index Primers A01 minute , then removed the ethanol. The ethanol wash was through H04 were separately mixed by vortexing. The repeated once . The wells were sealed with strip caps , then appropriate index assignments for each sample were deter the samples were briefly spun to collect the residual ethanol. mined . The SureSelect XT HS Index Primers were provided The plate or strip tube was returned to the magnetic stand for in single - use aliquots. To avoid cross - contamination of 30 seconds. The residual ethanol was removed with a P20 libraries, each vial was discarded after use in one library pipette. The samples were dried by keeping the unsealed preparation reaction . Residual volume was not re - used or plate or strip tube at room temperature for up to 5 minutes, retained for subsequent experiments . until the residual ethanol was just evaporated . 15 uL nucle [ 0546 ] Appropriate volume of pre - capture PCR reaction ase - free water was added to each sample well . The wells mix was prepared as described below on ice and then was were sealed with strip caps , then were mixed well on a mixed well on a vortex mixer. For example , a 13.5 uL vortex mixer and the plate or strip tube was briefly spun to reaction volume for 1 reaction contained 10 uL of 5x collect the liquid and was incubate for 2 minutes at room Herculase II Reaction Buffer, 0.5 uL of 100 mM dNTP Mix , temperature. The plate or strip tube was put in a magnetic 2 uL of Forward Primer, and 1 uL of 5x Herculase II Fusion stand and was left for 3 minutes, until the solution was clear. DNA , a 121 uL reaction volume for 8 reactions ( including 15 uL of the cleared supernatant was removed to a fresh PCR excess ) contained 90 uL of 5x Herculase II Reaction Buffer, plate or strip tube sample well and was kept on ice . The new 4.5 uL of 100 mM dNTP Mix , 18 uL of Forward Primer, and PCR plate was sealed containing libraries . The beads were 9 uL of 5x Herculase II Fusion DNA , or a 337 uL reaction discarded . The quality of sample libraries was checked using volume for 24 reactions ( including excess) contained 250 uL the an electrophoresis device, for example an automated US 2021/0005284 A1 Jan. 7. 2021 62 electrophoresis device ( e.g., a TapeStation System available Hybridize DNA Samples to the Capture Library from Agilent, www.agilent.com ) and a spectrophotometer, [ 0552 ] In some embodiments, the component reagents for for example a small volume full- spectrum , UV - visible spec hybridization using a SureSelect kit were thawed , according trophotometer ( e.g. , Nanodrop spectrophotometer available to the thawing conditions described below . Each reagent was from Thermo Fisher Scientific, www.thermofisher.com ), or vortexed to mix , then tubes were spun briefly to collect the the plate was stored at -20 ° C. liquid . [ 0549 ] Resources from the manufacturers, including Agi [ 0553 ] To each DNA library sample well , 5 ul SureSelect lent SureSelect XT HS Target Enrichment System for Illu XT HS and XT Low Input Blocker Mix (previously thawed mina Paired - End Multiplexed Sequencing Library protocol on ice ) were added . The wells were capped and then and Agilent SureSelect XT HS and XT Low Input Enzy vortexed at high speed for 5 seconds . The plate was spun matic Fragmentation Kit protocol, were incorporated by briefly to collect the liquid and any bubbles were released . reference herein . The sealed sample plates were transferred to the thermal cycler and the Hybridization program was started . The Example 4 : Hybridization -Capture and Target thermal cycler was programmed to pause during Segment 3 Enrichment of DNA Libraries of the Hybridization program to allow additional reagents to [ 0550 ] Hybridization -Capture based target enrichment be added to the Hybridization wells , as described in the next was used directly after Library Construction described in sections . During Segments 1 and 2 of the thermal cycling Example 3. This protocol described the steps to hybridize program , the additional reagents were prepared as described the prepared gDNA libraries with a target -specific capture in the next section . If needed , these steps could be finished probes. Target enrichment worked by mixing target- specific after the thermal cycler program pauses in Segment 3. A biotinylated probes with the DNA Library. The probes were 25 % solution of SureSelect RNase Block ( e.g. , previously bound to the targets which were then isolated by streptavidin thawed on ice ) was prepared and was mixed well by coated magnetic bead pulldown , leaving uncaptured DNA vortexing, and the mix was briefly centrifuged and then kept ( the areas of the genome that we do not want) behind . The on ice . steps to hybridize the prepared DNA libraries with a target [ 0554 ) Further, a Capture Library Hybridization Mix was specific capture library were provided . After library prepa prepared as follow for one or more reactions . For example , ration, the libraries were denatured and biotin - labeled a 13 ul reaction volume for 1 reaction contained 2 ul of probes specific to targeted regions were used for hybridiza 25 % RNase Block solution , 5 uL of Capture Library 3 Mb tion . The pool was enriched for regions of interest by adding ( e.g. , previously thawed on ice ) , and 6 uL of SureSelect Fast streptavidin - coated beads that were bound to the bioti Hybridization Buffer ( e.g. , previously thawed and kept at nylated probes. DNA fragments bound to the streptavidin room temperature ), a 117 uL reaction volume for 8 reactions coated beads via biotinylated probes were magnetically ( including excess ) contained 18 uL of 25 % RNase Block pulled down from the solution . The enriched fragments were solution , 45 uL of Capture Libraryz3 Mb, and 54 uL of then eluted from the beads . Each DNA library sample must SureSelect Fast Hybridization Buffer, or a 325 uL reaction be hybridized and captured individually. Some or all of the volume for 24 reactions ( including excess ) contained 50 uL procedures were managed and conducted by lab personnel. of 25 % RNase Block solution , 125 uL of Capture Library 3 In the event that quality control related issues arise , lab Mb, and 150 uL of SureSelect Fast Hybridization Buffer . personnel will notify the provider ( e.g. , healthcare provider ) [ 0555 ] The listed reagents were combined at room tem of the biopsy sample or the extracted DNA . As a general perature, mix well by vortexing at high speed for 5 seconds , work procedure , before beginning the procedure , work sur and then were spun down briefly. The mixture was just faces and pipettes were thoroughly disinfected by wiping prepared before pausing the thermal cycler in Segment 3 . down with 10 % bleach , followed by 70 % ethanol. The same The mixture was kept at room temperature briefly until the cleaning process was followed after completion of work mixture was added to the DNA samples on the cycler . procedure. Solutions containing the Capture Library were not kept at room temperature for extended periods. Normalization of Samples for Hybrid Capture [ 0556 ] The thermal cycler was pauses at Segment 3 of the Hybridization program ( 1 minute at 65 ° C. ) . With the cycler [ 0551 ] 12 ul nuclease - free water was used to normalize paused , and while keeping the DNA + Blocker samples in the samples to 500-1000 ng . The maximum amount of DNA cycler, 13 ul of the room - temperature Capture Library available was used for each sample , within the range pro Hybridization Mix was transferred to each sample well and vided . The lab personnel then navigated to the normalization was mixed well by pipetting up and down slowly 10 times . spreadsheet, located in the Clinical Lab . Documents folder The wells were sealed with fresh domed strip caps and that in the shared Google Drive . The tab labeled “ HC Normal all wells were made sure to completely sealed . A compres ization ” was selected . The Sample ID was entered into sion pad was placed on the plate to prevent evaporation column A and the measured concentration was entered into during hybridization . The Play button was pushed to resume column B. The spreadsheet automatically calculated the the thermal cycling program to allow hybridization of the volumes of sample and low TE required for normalization in prepared DNA samples to the Capture Library. Wells were columns G and H. If the concentration of a sample was on adequately sealed to minimize evaporation to prevent results the lower side , the spreadsheet calculated a volume of from being negatively impacted . sample > 12 uL and a volume of nuclease - free water < 0 uL . If this occurred , only 12 uL of sample was used , and the sample was not diluted . Using the volumes calculated in the Prepare Streptavidin - Coated Magnetic Beads spreadsheet, appropriate volumes were normalized into a 96 [ 0557 ] In some embodiments, the bead preparation steps well semi- skirted PCR plate . began approximately one hour after starting hybridization . US 2021/0005284 A1 Jan. 7 , 2021 63

Reagents for capture from the SureSelect XT HS Target and then pipetted up and down 8 times to resuspend the Enrichment Kit ILM Hyb Module included the SureSelect beads . The plate was sealed and the samples were kept on ice Binding Buffer, the SureSelect Wash Buffers 1 and 2 ( e.g. , until they were used later. Captured DNA was retained on all kept at room temperature ), and Dynabead MyOne the streptavidin beads during the post - capture amplification Streptavidin T1 ( e.g. , stored at 2 ° C. to 8 ° C. ) . Dynabeads step . MyOne Streptavidin T1 magnetic beads were brought to room temperature for at least 30 minutes . The Dynabeads Amplify the Captured Libraries MyOne Streptavidin T1 magnetic beads were vigorously resuspend on a vortex mixer . The magnetic beads settled [ 0561 ] In some embodiments , reagents for post - capture during storage . For each hybridization sample, 50 ul of the PCR amplification were thawed and kept on ice , and resuspended beads was added to wells of a fresh PCR plate . included a Herculase II Fusion DNA Polymerase (mixed by The beads were washed by adding 200 ul of SureSelect pipetting up and down ), a 5x Herculase II Reaction Buffer, Binding Buffer, mixing by pipetting up and down 20 times 100 mM dNTP Mix , and SureSelect Post - Capture Primer or capping the wells and vortexing at high speed for 5-10 Mix ( e.g. , all mixed by vortexing ). seconds. The plate was put into a magnetic separator device [ 0562 ] The Post - Capture PCR thermal cycler program was and waited for the solution to clear, approximately 5 min started to preheat the cycler. Appropriate volumes of PCR utes . The supernatant was removed and discarded . The wash reaction mix were prepared, on ice , and mixed well on a steps were repeated two more times , for a total of three vortex mixer. For example, a 25 uL reaction volume for 1 washed . The beads were resuspended in 200 ul of SureSelect reaction contained 12.5 uL of nuclease - free water, 10 uL of Binding Buffer. 5x Herculase II Reaction Buffer, 1 uL of Herculase II Fusion DNA Polymerase, 0.5 uL 100 mM dNTP Mix , and 1 uL Capture the Hybridized DNA Using Streptavidin - Coated of SureSelect Post - Capture Primer Mix . Beads [ 0563 ] For each reaction , 25 ul of the PCR reaction mix [ 0558 ] After the hybridization step was complete on the was added to each sample well containing bead - bound thermal cycler, the samples were transferred to room tem target- enriched DNA . The PCR reactions were mixed well perature . The entire volume ( approximately 30 ul) was by pipetting up and down until the bead suspension was immediately transferred of each hybridization mixture to the homogeneous . Splashing samples onto well walls was wells containing 200 ul of washed streptavidin beads using avoided and the samples were not spun at this step . The plate a multichannel pipette. The mixture was pipetted up and was sealed well . The plate was placed in the thermal cycler down 5-8 times to mix and then the wells were sealed with and compression pad was placed on the plate to prevent fresh caps. The capture plate was incubated on a 96 - well evaporation . The Play button was pressed to resume the plate mixer and was mixed at 1500 rpm for 30 minutes at Post - Capture PCR thermal cycler program . When the PCR room temperature . The samples were properly mixed in the amplification program was complete , the plate was spun wells . During the 30 - minute incubation for capture, Sure briefly. The streptavidin - coated beads were removed by Select Wash Buffer 2 was pre -warmed in the thermal cycler placing the plate on the magnetic stand at room temperature . at 70 ° C. by placing 200 uL aliquots of Wash Buffer 2 in The solution was waited to clear ( approximately 2 minutes ), wells of a fresh 96 - well plate and aliquot 6 wells of buffer and then each supernatant ( approximately 50 ul ) was trans for each DNA sample in the run . ferred to wells of a fresh plate . The beads could be discarded [ 0559 ] The wells were capped and then incubated in the at this time . thermal cycler, with heated lid ON , held at 70 ° C. until time for use . When the 30 -minute sample incubation period was Purify the Amplified Capture Libraries Using AMPure XP complete, the samples were briefly spun to collect the liquid . Beads The plate was put in a magnetic separator to collect the [ 0564 ] In brief, the AMPure XP beads were come to room beads and was waited until the solution was clear, then the temperature for at least 30 minutes . The beads were not supernatant was removed and discarded . The beads were frozen at any time . 400 ul of fresh 70 % ethanol per sample resuspended in 200 ul of SureSelect Wash Buffer 1 and were was prepared for later use in step as described herein . The mixed by pipetting up and down 15-20 times , until beads AMPure XP bead suspension was mixed well so that the were fully resuspended . The plate was put in the magnetic suspension appeared homogeneous and consistent in color . separator and was waited for the solution to clear ( approxi 50 ul of the homogeneous AMPure XP bead suspension was mately 1 minute ), and then the supernatant was removed and added to each amplified DNA sample ( approximately 50 ul ) discarded . The plate was removed from the magnetic sepa in the PCR plate and was mixed well by pipetting up and rator and was transferred to room temperature. The beads down 15-20 times , or the wells were capped and vortexed at were washed with Wash Buffer 2 , using the steps below : 1 ) high speed for 5-10 seconds . The beads were made sure to resuspend the beads in 200 ul of 70 ° C. pre -warmed Wash be in a homogeneous suspension in the sample wells . Each Buffer 2 ; 2 ) pipetted up and down 15-20 times , until beads well had a uniform color with no layers of beads or clear were fully resuspended ; 3 ) incubated the samples for 5 liquid present. The samples were then incubated for 5 minutes at 70 ° C. on the thermal cycler with the heated lid minutes at room temperature. The plate was put on the on ; 4 ) After the 5 minute incubation , the plate was put in the magnetic stand at room temperature and was waited for the magnetic separator at room temperature ; 5 ) the solution was solution to clear ( approximately 3 to 5 minutes ). While waited to clear ( approximately 1 minute ), then the superna keeping the plate on the magnetic stand , the cleared solution tant was removed and discarded ; and 6 ) the wash steps were from each well was carefully removed and discarded . The repeated five more times for a total of 6 washes . beads were not disturbed while removing the solution . The [ 0560 ] After verifying that all wash buffer was removed , plate was continued to be placed on the magnetic stand while 25 ul of nuclease - free water was added to each sample well dispensing 200 ul of freshly - prepared 70 % ethanol in each US 2021/0005284 A1 Jan. 7 , 2021 64 sample well and waited for 1 minute to allow any disturbed related issues arise , lab personnel will notify the provider beads to settle , then the ethanol was removed . ( e.g. , healthcare provider) of the biopsy sample or the [ 0565 ] The ethanol wash was repeated once for a total of extracted RNA . two washes . All of the ethanol at each wash step was carefully removed . The wells with then sealed with strip Adenylate 3 ' Ends caps, and then were briefly spin to collect the residual ethanol. The plate was returned to the magnetic stand for 30 [ 0569 ] The reagents were prepared according to the con seconds. The residual ethanol was removed with a P20 ditions below . In brief, 2.5 uL Resuspension buffer was pipette . Next, the samples were dried by keeping them at added to each well containing sample ( Resuspension Buffer room temperature until the wells were dry ( about 5-10 is typically stored at -25 ° C. to -15 ° C. and let stand for 30 minutes ). The bead pellet was ensured to not start to crack , minutes to bring to room temperature before use ) . 12.5 as this was a sign of over drying. 25 ul of nuclease - free water ULA - Tailing Mix was added to each well , and then was was then added to each sample well . The sample wells were mixed thoroughly by pipetting up and down 10 times sealed , mixed well on a vortex mixer and then briefly spun ( A - Tailing Mix is typically stored at -25 ° C. to -15 ° C. and to collect the liquid without pelleting the beads . The wells thawed at room temperature ). The plate was sealed and were incubated for 2 minutes at room temperature . The plate centrifuged at 280xg for 1 minute . The plate was incubated was put on the magnetic stand and left until the solution was on the ATAIL70 program of the thermal cycler. The clear. A new PCR plate was labeled with the Run ID . The ATAIL 70 program was as the following steps: 1 ) preheat lid : cleared supernatant ( approximately 25 ul ) was transferred to 100 ° C. hold time , 2 ) step 1 : 37 ° C. for 30 minutes, 3 ) step the fresh plate . The beads could be discarded at this time . 2 : 70 ° C. for 5 minutes , and 4 ) step 3 : 4 ° C. hold time . The Then , the quality of captured libraries was checked by qPCR plate was then centrifuged at 280xg for 1 minute . methods by using the Roche LightCycler SOP, or stored at -20 ° C. Ligate Adapters [ 0570 ] The reagents were prepared according to the con Example 5 : The Constructions of RNA Libraries ditions below . In brief, the RNA Adapter tubes were centri for Sequencing fuged at 600xg for 5 seconds . Ligation Mix was removed from -25 ° C. to -15 ° C. storage . The following reagents [ 0566 ] RNA libraries were prepared before performing the were added in the order listed to each well : 1 ) 2.5 uL downstream sequencing. In brief, this protocol explained Resuspension Buffer, 2 ) 2.5 uL Ligation Mix , and 3 ) 2.5 uL how to convert cDNA was synthesized from mRNA in a RNA Adapter Indexes . The mixed reagents were then mixed total RNA sample, into a library of DNA for hybridization thoroughly by pipetting up and down 10 times and were capture prior to sequencing. The reagents provided in an centrifuged at 280xg for 1 minute . The plate was placed on Illumina TruSeq Stranded mRNA library prep workflow the thermal cycler and the LIG program was run . The LIG were used . program was as the following : 1 ) preheat lid : 100 ° C. hold ( 0567 ] The process involved the adenylation of the 3 ' ends time , 2 ) step 1 : 30 ° C. for 10 minutes, and 3 ) step 2 : 4 ° C. of blunt ended fragments by the addition of one adenine hold time . The Stop Ligation Buffer was centrifuged at nucleotide . This prevented them from ligating to each other 600xg for 5 seconds. Once the LIG program stopped , the during adapter ligation reaction . One corresponding thymine plate was removed from thermal cycler and 5 uL Stop nucleotide on the 3 ' end of the adapter provided a comple Ligation Buffer was added to each well , and was mixed mentary overhang for ligating the adapter to the fragment. thoroughly by pipetting up and down . The plate was then This strategy ensured a low rate of chimera ( concatenated centrifuged at 280xg for 1 minute . Ligation Mix from template ) formation . In the next step , multiple indexing storage was not removed until instructed to do so in the adapters were ligated to the ends of the ds cDNA fragments , procedure. RNA Adapter Indexes are typically stored at which prepared them for hybridization onto a flow cell . -25 ° C. to -15 ° C. and thawed at room temperature for 10 Fragments with no adapters were not hybridized to surface minutes prior to use . Resuspension Buffer and AMPure XP bound primers on the flow cell . Fragments with an adapter Beads are typically stored at 2 ° C. to 8 ° C. and let stand for on one end can hybridize to surface bound primers, but did 30 minutes to bring to room temperature before use . Stop not form clusters . The DNA fragment enrichment process Ligation Buffer is typically stored at -25 ° C. to -15 ° C. and used PCR to selectively enrich those DNA fragments that thawed at room temperature before use . had adapter molecules on both ends and to amplify the amount of DNA in the library . PCR was performed with a Clean Up Ligated Fragments PCR Primer Cocktail that annealed to the ends of the [ 0571 ] In brief, 42 uL AMPure XP beads were added to adapters. RNA Library Construction consisted of three steps each well and mixed thoroughly by pipetting up and down as described herein . Introduction above followed by a before incubated at room temperature for 15 minutes. After library clean up and library quantitation by qPCR per the incubation , the mix was centrifuged at 280xg for 1 minute . protocol of using a nucleic acid amplification device ( e.g. , a The wells were then placed on a magnetic stand and waited PCR system ), for example a real - time PCR system ( e.g. , a until the liquid is clear ( about 2-5 minutes ). While waiting LightCycler Instrument 480 available from Roche , www . for the liquid to clear, fresh 80 % EtOH was made for use in lifescience.roche.com ) by Roche Life Science . Accurate the two washes step above. After the liquid was cleared , all quantification achieved by qPCR allowed to create optimum supernatant was removed and discarded from each well , and cluster densities across all four lanes of the flow cell . was wash two times as the following : 1 ) added 200 uL fresh [ 0568 ] Some or all of the procedures were managed and 80 % EtOH to each well , 2 ) incubated on the magnetic stand conducted by lab personnel. In the event that quality control for 30 seconds , and 3 ) removed and discarded all superna US 2021/0005284 A1 Jan. 7 , 2021 65 tant from each well . 20 ÎL pipette was used to remove well . The magnetic stand was air - dried for 5 minutes. The residual EtOH from each well . bead pellet was ensured to not start to crack , which was a [ 0572 ] The magnetic stand was air - dried for 5 minutes . sign of over drying . The magnetic stand was removed next . The bead pellet did not start to crack , as this was a sign of 32.5 uL Resuspension buffer was added to each well , and over drying. The magnetic stand was then removed . 52.5 uL was mixed thoroughly by pipetting up and down 10 times Resuspension buffer was added to each well and mixed before incubating at room temperature for 2 minutes . The thoroughly by pipetting up and down before incubating at wells were centrifuged at 280xg for 1 minute . A magnetic room temperature for 2 minutes . The mixed buffer was stand was placed and waited until the liquid was clear ( 2-5 centrifuged at 280xg for 1 minute . A magnetic stand was minutes ). 30 uL supernatant was suspended to the corre placed and waited until the liquid was clear ( about 2-5 sponding well of a newly labeled PCR plate . The lab minutes ). 50 uL supernatant was transferred to the corre personnel then proceeded with library QC using the a sponding well of a newly labeled PCR plate . 50 uL AMPure nucleic acid amplification device ( e.g. , a PCR system ), for XP beads were added to the plate and mixed thoroughly by example a real - time PCR system ( e.g. , a LightCycler Instru pipetting up and down before incubating at room tempera ment 480 available from Roche , www.lifescience.roche . ture for 15 minutes . The plate was centrifuged at 280xg for com ) , or the plate was sealed and stored at -20 ° C. for up to 1 minute . A magnetic stand was placed and waited until the 7 days. PCR Primer Cocktail is typically stored at -25 ° C. liquid was clear ( 2-5 minutes ). All supernatant was removed to -15 ° C. and thawed at room temperature before use . PCR and discarded from each well . The well was washed two Master Mix is typically stored at -25 ° C. to -15 ° C. and times as following: 1 ) added 200 uL fresh 80 % EtOH to each thawed on ice before use . Resuspension Buffer and AMPure well , 2 ) incubated on the magnetic stand for 30 seconds , and XP Beads are typically stored at 2 ° C. to 8 ° C. and let stand 3 ) removed and discarded all supernatant from each well . for 30 minutes to bring to room temperature before use . [ 0573 ) After that, 20 uL pipette was used to remove [ 0576 ] Resource from the manufacturers , including residual EtOH from each well . The magnetic stand was TruSeq Stranded mRNA Reference Guide , is incorporated air - dried for 5 minutes. The bead pellet was ensured to not by reference herein . starting to crack , as this would be a sign of over drying. The bead pellet was then removed from the magnetic stand . 22.5 Example 6 : Quality Control Concerning the uL Resuspension Buffer was added to each well and mixed DNA /RNA Library Preparation Process Based on thoroughly by pipetting up and down before incubating at DNA and RNA from Fresh Frozen Tissue Library room temperature for 2 minutes. The wells then were Sequencing centrifuged at 280xg for 1 minute . A magnetic stand was [ 0577 ] For library preparation, extracts of DNA and RNA placed and waited until the liquid was clear ( 2-5 minutes ). from the tissue were obtained by using the AllPrep DNA / 20 uL supernatant was transferred to the corresponding well RNA Mini Kit . Any suitable extraction kit known in the art of a newly labeled PCR plate . The beads were not disturbed could also be used . Library construction from purified DNA during the process . Alternatively , this step was a safe stop was carried out with Agilent SureSelect XT HS and Agilent ping point. The plate could be sealed and stored at -25 ° C. SureSelect Human All Exon V7 exome kits . Library con to -15 ° C. for up to 7 days . struction from purified RNA was carried out with Illumina TruSeq mRNA stranded kit . Quality control ( QC ) metrics Enrich DNA Fragments were carried out after each stage of library preparation . All [ 0574 ] The reagents were prepared according to the con QC metrics were prepared with a spectrophotometer, for ditions below . In brief, the PCR plate was placed on ice and example a small volume full- spectrum , UV - visible spectro 5 uL PCR primer cocktail was added to each well . 25 uL photometer ( e.g. , Nanodrop spectrophotometer available PCR Master Mix was added to each well , and then mixed from ThermoFisher Scientific , www.thermofisher.com ), a thoroughly by pipetting up and down 10 times . The sample fluorometer, for example for quantification of DNA or RNA wells were sealed and centrifuged at 280xg for 1 minute . ( e.g. , a Qubit Flex fluorometer available from ThermoFisher The sample wells were placed on the thermal cycler and the Scientific , www.thermofisher.com ), a nucleic acid amplifi mRNA PCR program was performed . The mRNA PCR cation device ( e.g. , a PCR system ) , for example a real - time program was as the following: 1 ) preheat lid : 100 ° C. hold PCR system ( e.g. , a LightCycler Instrument 480 II available time , 2 ) step 1 : 98 ° C. for 30 seconds, 3 ) step 2 ( 15 cycles ) : from Roche , www.lifescience.roche.com ) and an electro 98 ° C. for 10 seconds , 60 ° C. for 30 seconds , and 72 ° C. for phoresis device, for example an automated electrophoresis 30 seconds , 4 ) step 3 : 72 ° C. for 5 minutes, and step 4 ) 4 ° device ( e.g. , a , Agilent TapeStation System 4150 available C. hold time . from Agilent, www.agilent.com ). All measurement content [ 0575 ] Once the program was complete and the plate was carried purity, concentrations and size of DNA /RNA frag centrifuged at 280xg for 1 minute . AMPure XP beads were ments . mixed by thorough vortexing and 50 uL was added to each [ 0578 ] QC metrics during the next generation sequencing well and mixed thoroughly by pipetting up and down 10 experiment were a set of individual parameters that evalu times before incubating at room temperature for 15 minutes . ated the overall quality of the data set . The following metrics The sample wells were centrifuged at 280xg for 1 minute . A were evaluated : cluster density, percentage of clusters pass magnetic stand was placed and waited until the liquid was ing filters that were assigned to an index , quality score of 30 clear ( 2-5 minutes ). All supernatant was removed and dis ( Q30 ) and error rate . The next stage was to estimate quality carded from each well . The wells were washed two times as metrics used in the bioinformatics pipeline (Bioinformatics the following: 1 ) added 200 uL fresh 80 % EtOH to each QC ) . It was divided into two processes : WES ( DNA well , 2 ) incubated on the magnetic stand for 30 seconds , and sequencing) and RNA sequencing ( RNA - seq ). The follow 3 ) removed and discarded all supernatant from each well . A ing metrics were taken into account: tumor purity, depth of 20 uL pipette was used to remove residual EtOH from each coverage , alignment rate , base call quality scores or Phred US 2021/0005284 A1 Jan. 7 , 2021 66 score , uniformity of coverage , GC content, mapping quality, and hybridization and capture. Measuring the concentrations duplication rate , insert size , contamination , SNP concor of extracts , primary libraries and libraries after hybridization dance , HLA allele concordance and ADA genomes contami and capture , the quality of products from the samples were nation . identified . Based on the determination of the quality of DNA [ 0579 ] In general, the protocol described in the present or RNA in the tested sample, a decision was made to either example provides the metrics for QC methods used in the go forward to the next step or to repeat the processes. A library preparation stages and in the bioinformatic pipeline spectrophotometer, for example a small volume full - spec of whole - exome sequencing ( WES ) and RNA - seq analysis . trum , UV - visible spectrophotometer ( e.g. , Nanodrop spec Bioinformatic pipeline was divided into two components : trophotometer available from Thermo Fisher Scientific , QC from sequencing platform and Bioinformatics QC . On a www.thermofisher.com ), a fluorometer, for example for sequencer platform and in Bioinformatics QC , a table with quantification of DNA or RNA ( e.g. , a Qubit fluorometer estimated metrics for WES and RNA - seq data was provided . available from ThermoFisher Scientific, www.thermofisher. For some metrics within a targeted range , which were the com ) , an electrophoresis device, for example an automated most appreciable value and acceptable range , the sample electrophoresis device ( e.g., a TapeStation System available data could be used . If the values of the obtained metrics fell from Agilent, www.agilent.com ) and a nucleic acid ampli outside of the acceptable range , the corresponding sample fication device ( e.g. , a PCR system ) , for example a real - time was considered having poor quality . PCR system ( e.g. , a LightCycler Instrument available from [ 0580 ] In any given experiment or project, one or more of Roche , www.lifescience.roche.com ) were used for measur the quality control processes can be used . In some experi ing purity, concentrations and size of DNA /RNA fragments . ment or project, all of the quality control processes can be The acceptable and the targeted ranges of DNA and RNA for used . Some or all of the procedures were managed and the respective devices for each phase are indicated in the conducted by lab personnel. In the event that quality control tables below . The results of the quality control can be related issues arise , lab personnel will notify the provider confirmed by performing electrophoresis. The results of the ( e.g. , healthcare provider) of the biopsy sample or the quality control can be confirmed by the determination of the extracted DNA /RNA . size distribution of the nucleic acids . [ 0582 ] The present Example provides troubleshooting Quality Control Steps in the Process of DNA and RNA protocols. For example, if an additional peak at 150 bp at the Library Preparation electropherogram from the TapeStation on the LC or HC [ 0581 ] Table 4 - Table 6 describe embodiments of DNA and stage was observed . An additional step of washing with RNA library preparation including one or more quality AMPure beads ( library to beads 1 : 0.8 volume ratio ) were control steps at each phase : extraction , library construction , made . TABLE 4 Quality Control: Extraction of DNA or RNA for Library Preparation Extraction Device DNA (Acceptable / Target) RNA (Acceptable / Target) Comments Total amount for next step Total amount for next step 20-200 ng 1 30-200 ng 100-1000 ng | > 200-1000 ng Nanodrop concentration > 4 ng /ul | concentration > 3 ng/ ul | If the metrics are out of range >5.5 ng/ ul > 5 ng /ul you have to repeat extraction 260/280 > 1.5 | 1.8-2.0 260/280 > 1.5 | 1.8-2.0 process 260/230 > 1.5 2.0-2.2 260/230 > 1.5 | 2.0-2.2 Qubit Concentration > 3 ng /ul | Concentration > 2 ng /ul | > 4.5 ng /ul >4 ng /ul Tapestation Concentration > 2.5 ng /ul | Concentration > 1.5 ng /ul | > 4 ng /ul > 3 ng /ul RIN > 5 > 8 TABLE 5 Quality Control: DNA or RNA for Library Construction Library construction Device DNA ( Acceptable / Target) RNA ( Acceptable Target) Comments Total amount for next step Total amount for next step 200-1000 ng 500-1000 ng 0.5-4 nmol / l | 0.5-4 nmol / l Qubit Concentration > 17 ng /ul | Concentration > 0.1 ng / ul If the metrics are out of range , > 42 ng /ul > 0.1 ng /ul extraction process can be repeated . Tapestation Concentration > 15 ng /ul | Concentration > 0.1 ng /ul | If an electropherogram with > 40 ng /ul > 0.1 ng/ ul an additional peak at 150 bp is Concentration > 0.5 nmol/ l | observed , the troubleshooting > 0.5 nmol / l as described in the present Average 370-440 | 370-440 Example would provide guidance . LightCycler Not required Concentration > 0.5 nmol / l | > 0.5 nmol / 1 US 2021/0005284 A1 Jan. 7 , 2021 67

TABLE 6 Bioinformatics OC Quality Control : DNA or RNA for Library after Hybridization [ 0584 ] After the performance of sequencing ( e.g. , RNA and Capture seq ) , quality control can be performed for bioinformatics Device DNA (Acceptable Target) Comments Final concentration for pipeline . In brief, the software was created and generated by pooling the following parameters for measurements : single nucleo 0.5-4 nmol / l | 0.5-4 nmol / l Qubit Concentration > 0.1 ng / ul | If the metrics are out of range , tide variants ( SNVs; somatic plus germline variants ), small > 0.1 ng /ul extraction process can be indels , copy number alterations ( CNAs) ( plus loss of het repeated . erozygosity ( LOH ) ) , focal amplifications /deletions , gene Tapestation Concentration > 0.1 ng / ul | If you see an > 0.1 ng/ ul electropherogram with an fusions rearrangement (mRNA expressed ), fusion protein Concentration > 0.5 nmol / l | additional peak at 150 bp is expression , RNA expression ( for biomarker proteins ), and > 0.5 nmol/ 1 observed , look in the Average 380-440 | 380-440 troubleshootinglist ( # 1 ) as tumor mutational burden ( TMB ) . described in the present [ 0585 ] In any given experiment or project, one or more LightCycler Concentration > 0.5 nmol / 1 | Example would provide guidance. paramaters could be used for quality control. For example, > 0.5 nmol / 1 only SNV detection was performed for certain bioinformatic analysis . Only in /del detection was performed for certain bioinformatic analysis. Only CNA detection was performed for certain bioinformatic analysis . Only fusion detection was performed for certain bioinformatic analysis. Only RNA Quality Control Steps after DNA and RNA Library Prepa expression measurement detection was performed for cer ration tain bioinformatic analysis. Only TMB measurement detec [ 0583 ] The main quality control metrics for the sequenc tion was performed for certain bioinformatic analysis. In ing process occurred on the Illumina NextSeq® 500/550 other examples , SNV detection and CAN detection can be sequencer. Table 7 shows the QC parameters of the sample performed for certain bioinformatic analysis. run for whole - genome sequencing ( WES ) and RNA -se [ 0586 ] Table 8 provides a lists of quality control param quencing eters for bioinformatics analysis. TABLE 7

Quality Control: Sequencing Processes Sample run

QC parameter Targeted values WES Targeted values RNA - seq Comments Cluster density Targeted range 170-220 Targeted range 170-220 Cluster density and CLUSTER PF ( % ) Acceptable range < 280 Acceptable range < 280 Alignment rate should be monitored in every run . Actual Yield > 15 Gbp The yield that's been received % 2Q30 Targeted range > 85 % Targeted range > 85 % The percentage of bases Acceptable range > 75 % Acceptable range > 75 % with a quality score of 30 or higher Quality scores and quality of signal / noise ratio should be monitored in every run . ERROR RATE % Targeted < 0.7 % Targeted < 0.7 % Refers to the percentage of Acceptable < 1 % Acceptable < 1 % bases called incorrectly at any one cycle . It is calculated from the reads that are aligned to Illumina's PhiX control. If this was not used then % Q30 would be the best tool to check base quality. Error rate increases along the length of the read . US 2021/0005284 A1 Jan. 7. 2021 68

TABLE 8 Quality Control : Bioinformatics Bioinformatics Targeted values Targeted values QC parameter WES RNA -seq Comments Tumor purity > = 20 % > = 20 % In case of failure - inform the physician if to proceed with sample below LOD Depth of coverage > = 150x average > 50 mln pair -end In case of failure - coverage - tumor reads resequence the sample sample > = 100x average coverage of normal tissue Alignment rate > 90 % > 90 % Base call quality Phred score > 30 Phred score > 30 Quality scores and quality scores of signal/ noise ratio should be monitored in every run . Low - quality scores can lead to increased false positive variant calls ; thus , results must be interpreted with caution and repeat testing may be indicated . Uniformity of 85 % of base pairs Not applicable coverage in target regions covered > = 20x for tumor tissue 85 % of base pairs in target regions covered > = 20x for normal tissue GC bias Targeted value 50 Targeted value 50 GC bias should be Acceptable range : Acceptable range : monitored with every run 45-65 45-65 to detect changes in test performance or sample quality issues . In the last assembly of the reference human genome , the GC composition for the entire genome is ~ 40 % , 48.9 % for the RNA encoding . We use Exon V7 kit, the mean GC content for the targets there is 47.9 % . Mapping quality MapQ > = 10 Not applicable The proportion of reads that do not map to target regions must be monitored during each run . Poor mapping quality may be a result of non - specific amplification , capture of off target DNA , or contamination . Reads with MapQ = 0 are obtained with simultaneous alignment to several regions. Duplication rate < 30 % < 85 % The duplication rate should be monitored in every run and for each sample independently to monitor library diversity . Insert size Median insert size Median insert Insert size is the length of for tumor size [ 150 ; 200 ] the sequencing DNA ( or tissue ~ [ 150 ; 200 ] RNA ) that is " inserted " Median insert size between the adapters . for normal tissue [ 150 ; 200 ] Contamination < 0.05 % < 0.05 % Contamination - Percentage of sequence segments of foreign origin in sample . A contaminated sequence is one that does not faithfully represent the US 2021/0005284 A1 Jan. 7. 2021 69

TABLE 8 - continued Quality Control : Bioinformatics Bioinformatics Targeted values Targeted values QC parameter WES RNA - seq Comments genetic information from the biological source SNP concordance Targeted > 90 % Targeted > 90 % In case of failure , of a pair of samples Acceptable > 85 % Acceptable > 85 % investigate where the tumor / nomial from mix - up could have the same patient happened . Stop the bioinformatics analysis until the troubleshooting and sample concordance HLA allele Threshold for Threshold for In case of failure concordance of a normal vs tumor tumor RNAseq Investigate where the pair of samples tissue < 5 vs normal WES mix - up could have tumor /normal from tissue < 5 happened. Stop the the same patient bioinformatics analysis until the troubleshooting and sample concordance ADA genomes Targeted Targeted For One hit one genome , contamination threshold > 60 threshold > 40 GRCh38.d1.vd1 Acceptable Acceptable In case of failure - report threshold > 40 threshold > 20 to lab personnel ( with WARNING (with WARNING sign ) sign )

[ 0587 ] The present Example provides troubleshooting determining the sequences of the sequence data at the MHC protocols. For example , if the quality control of HLA allele loci at least one additional time , reporting the sequence data concordance of tumor / normal pair failed , it was an indicator as inconsistent, and /or a combination thereof. FIG . 11 illus that the tissues from different patients were potentially trates an example of MHC data validation . In FIG . 11 , six mixed up . The lab personnel should proceed to confirm the HLA alleles are determined from each of three sequence potential mix up and investigate the reason of the potential data sets (RNA - Seq data , WES tumor data , and WES normal mix up . The lab personnel should reach out to the physician data ) from two subjects ( e.g. , 103 and 105 ) . As can be seen if the mix up was not due to the internal errors in the from FIG . 11 , all three samples share all six alleles for laboratory subject 105 , indicating they are consistent and likely from the same subject. In the case of subject 103 however, there Example 7 : Determining the Sequence of Major is consistency for two sequence data sets ( between whole Histocompatibility Complexes ( MHC ) can be Used exome sequence data from a tumor sample ( WES Tumor) to Assess Sequence Data Identity and /or Integrity and whole exome sequence data from a normal sample ( WES Normal )) , but an inconsistency for a third sequence [ 0588 ] MHC genes are highly polymorphic , with large data set ( RNA sequence data allegedly from the same numbers of alleles for the genes of each class ( e.g. , Class I, subject 103 ) . II , and III ) of MHC ( e.g. , human leukocyte antigens ( HLAS ) [ 0589 ] In some embodiments, the sequence of at least one in humans ). The combination of the number of potential MHC locus is determined and verified against at least one alleles in a population with the number of genes in each MHC sequence of the same locus from a reference sequence individual result in a large number of unique MHC profiles. data set ( e.g. , from a sample asserted to be from the same These can be used to assess the likelihood that sequence data subject ). In some embodiments, two or more MHC alleles is from a given source or subject ( or if sequence data from loci are sequenced ( e.g. , at least three , four, or five MHC loci multiple samples are from the same subject ). Sequences are sequenced ). In some embodiments , six MHC loci are corresponding to one or more MHC loci can be used to sequenced. In some embodiments , more than six MHC loci determine an MHC allele combination for a particular are sequenced nucleic acid sample . The result of the sequencing one or [ 0590 ] In some embodiments, the sequence data is from a more MHC loci can be evaluated against asserted informa human subject. In some embodiments, the MHC is human tion ( e.g. , an asserted HLA allele combination ) which is leukocyte antigens ( HLA ) . Accordingly, in some embodi expected to be consistent with the sequence data . If the ments two or more HLA loci are sequenced ( e.g. , three, four, determined MHC combination matches the asserted infor five, or more HLA loci ) . In some embodiments , six HLA loci mation , the sequence data is consistent. If the determined are sequenced . In some embodiments, more than six HLA MHC combination does not match the asserted information , loci are sequenced . the sequence data is inconsistent and this can indicate a problem with the sample and / or sequence data . For example , [ 0591 ] In some embodiments, the results are displayed to the sample and / or sequence data may have been contami a user in a report ( e.g. , via a GUI ). nated , misidentified , degraded , or otherwise corrupted . This can prompt investigation into the origin of the inconsistency. Example 8 : Predicted Tumor Type can be Used to Such investigation may entail determining the sequence of Assess Sequence Data Identity and / or Integrity the sequence data at the MHC loci at least one additional [ 0592 ] Various techniques can be used to predict, based on time , obtaining a second sequence data from the sample and nucleic acid sequence data , the type of tumor from which a US 2021/0005284 Al Jan. 7 , 2021 70 sample was taken ( e.g. , breast, colon , prostate , bladder, and its constituent subunits ), the sequence information can kidney , rectal, lung, lymphoma, melanoma, oral, oropharyn be validated . If the determined ratio does not match the geal , pancreatic , thyroid , uterine, eye, gastrointestinal, etc. ). expected ratio , the sequence data is inconsistent and may Many existing tools rely on large data sets of known samples indicate a problem with the sample or sequence data . For with evaluated biomarkers, thereby allowing for the com example , the sample and / or sequence data may have been parison of biomarkers from a sequence data set to be contaminated , misidentified , degraded , or otherwise cor evaluated against an existing known data set . Other methods rupted. This can prompt an investigation into the origin of of prediction utilize neural networks and deep learning the inconsistency . Such investigation may entail determining systems to analyze data sets and perform the data analysis . a new ratio from the sequence data at least one additional The sequence data can be trained against an existing net time , obtaining a second sequence data set from the sample work or data set to predict the type of tumor from which the and determining the ratio at least one additional time , sequence data was obtained . reporting the sequence data as inconsistent, and / or a com [ 0593 ] The result of a tumor type prediction (e.g. , deter bination thereof. mined information ) can then be evaluated against asserted [ 0597 ] FIG . 13A shows a graph representing expression information ( e.g. , a tumor type ) which is believed to be levels of subunits which agree with a predicted or known consistent with the sequence data . If the determined infor value for the subunits being evaluated or are within an mation matches the asserted information , the sequence data acceptable or determined threshold for such ratio . FIG . 13B is consistent and believed to be identified correctly. If the shows a graph representing expression levels of subunits determined information matches the asserted information, which disagree with the predicted or known value for the the sequence data can be processed to determine whether the subunits being evaluated or are outside an acceptable or sequence data is indicative of one or more disease features . determined threshold for such ratio . If the determined information does not match the asserted [ 0598 ] As can be seen in FIG . 13A , nucleic acids encoding information , the sequence data is inconsistent and may protein subunits can be evaluated against a known ratio ( e.g. , indicate a problem with the sample or sequence data . For an existing measured value , or a theoretical value based on example, the sample and /or sequence data may have been sequences known in the art) or can be evaluated against contaminated , misidentified , degraded , or otherwise cor measured data from known samples ( e.g. , fit to a line as rupted . This can prompt investigation into the origin of the shown ). When the ratio falls within accepted or established inconsistency. Such investigation may entail predicting the thresholds for variability and deviation, it is identified as tumor type from the sequence data at least one additional consistent. As can be seen in FIG . 13B , nucleic acids time , obtaining a second sequence data set from the sample encoding protein subunits can evaluated against a known and performing the prediction at least one additional time , ratio ( e.g., existing measured value , or theoretical value reporting the sequence data as inconsistent, and / or a com based on sequences known in the art ), or an be evaluated bination thereof. against measured data from known samples ( e.g. , fit against [ 0594 ] As can be seen in FIG . 12 , a tumor type can be a line as shown ). When the ratio falls outside accepted or predicted ( e.g. , BRCA associated breast cancer ) and evalu established thresholds for variability and deviation , it can be ated in the context of an asserted tumor type or in the context identified as inconsistent. of at least one additional sequence data set ( e.g. , reference [ 0599 ] In some embodiments, at least one ratio is deter sequence data , or sequence data from the same subject mined . In some embodiments, nucleic acids encoding a and / or the same tumor sample ). If the asserted information second protein and /or its subunits are evaluated to determine matches the determined information , the data is consistent. a second ratio . In some embodiments, nucleic acids encod If they do not match , it signals a possible inconsistency ing a third protein and / or its subunits are evaluated to which may be evaluated and / or reported to the user. Further, determine a third ratio . In some embodiments , nucleic acids when the determined value is evaluated in the context of encoding a fourth protein and / or its subunits are evaluated to additional sequence data , it can be used to evaluate whether determine a fourth ratio . In some embodiments , nucleic the sequence data are from the same subject or source , or acids encoding at least one additional protein and / or its from different sources . subunits are used to determine at least one additional ratio . [ 0595 ] Accordingly , in some embodiments a predicted [ 0600 ] In some embodiments, the subunits used to deter tumor type is determined from the sequence information and mine the ratio are CD3 subunits CD3D and CD3G . In some evaluated against an asserted tumor type . In some embodi embodiments, the subunits used to determine the ratio are ments, the results are displayed to a user in a report ( e.g. , via CD3 subunits CD3E and CD3D . In some embodiments, the a GUI) . subunits used to determine the ratio are CD3 subunits CD3G and CD3E . In some embodiments , the subunits used to Example 9 : Ratio of Protein Subunits can be Used determine the ratio are CD8 subunits CD8B and CD8A . In to Assess Sequence Data Identity and / or Quality some embodiments , the subunits used to determine the ratio [ 0596 ] Multi - subunit proteins encoded by the nucleic acid are the CD79 subunits CD79A and CD79B . can be used to evaluate the sequence data . The expression In some embodiments, the results are displayed to a user in levels of different subunits of a protein can be evaluated by a report ( e.g. , via a GUI) . determining the expression of each subunit ( e.g. , by deter Example 10 : Polyadenylation Status can be Used to mining DNA or RNA levels encoding each subunit ) and determining a ratio of the subunits ( e.g., by determining a Assess Sequence Data Identity and / or Integrity ratio of DNA or RNA levels encoding different protein [ 0601 ] PolyA status can be used to evaluate the sequence subunits in a nucleic acid sample ). This ratio (determined data . The sequence data can evaluated to determine whether information ) can then be validated against either asserted different genes are polyadenylated are present or not ( e.g. , information ( e.g. , an expected ratio ) or additional sequence histone genes, mitochondrial genes ) . This analysis can be data . If the ratio matches an expected ratio ( e.g. , a ratio used to evaluate and or assess the likelihood that an asserted either believed to be accurate based on other sequence data sample preparation protocol is correct ( e.g. , to validate obtained from the subject, or a known ratio for the protein whether an RNA sample is a polyA or a total RNA sample ). US 2021/0005284 A1 Jan. 7 , 2021 71

If the determined polyA status matches the asserted polyA ( PCA ) can be used ) . Such techniques can be useful in status, the sequence data is validated as consistent. If the evaluating sequence data for identity and / or integrity . For determined polyA status does not match the asserted polyA instance, exon coverage can be determined from the status , the sequence data is identified as inconsistent and sequence information and evaluated to determine whether may indicate a problem with the sample or sequence data . there is a consistent level of coverage when compared to Additionally, in the instance where ambiguous results are other sequence information or to an asserted ( e.g. , expected ) returned for the polyA status ( e.g. , where polyadenylated coverage level. An inconsistency in the coverage ( e.g. , genes are found , but others are not , or where unanticipated higher or lower coverage than expected ) could indicate that expression is found, or where less than expected expression the sequence data is from a different source than expected is found ( e.g. , partial expression ) ), it may indicate problems ( e.g. , than asserted ), or that there is a problem with the with the sample preparation , degradation of the sample from sequence data or the sample from which it was obtained . which the sequence data was prepared, or other quality [ 0605 ] Exon coverage can be determined for different issues . For example , the sample and/ or sequence data may batches of sequence data reads from a given subject and have been contaminated, misidentified , degraded , or other plotted against sequence data from other subjects. wise corrupted. This can prompt an investigation into the In some embodiments , the results of the evaluation are origin of the inconsistency. Such investigation may entail presented to a user in a report ( e.g. , via a GUI) . determining a polyA status from the sequence data at least one additional time , obtaining a second sequence data from Example 12 : RNAseq Read Distribution and the sample and determining the polyA status at least one Composition can be Used to Assess Sequence Data additional time , reporting the sequence data as inconsistent, Identity and / or Integrity and / or a combination thereof . [ 0602 ] FIGS . 14A - 14B show examples of bar graphs [ 0606 ] In some embodiments, read composition can be representing the probability that sequence information was evaluated in the context of the number of reads of a given obtained from samples that contained only polyadenylated component ( e.g. , protein coding sequence ) of the sequence RNA or from samples that contained total or all RNA ( total data in terms of either total number of reads for that RNA ). FIG . 14A shows positive results indicating component and / or as a relative percentage of that component sequences which appear uniform ) from the analysis of two calculated against the total number of reads ) . These can be different sequences. The left set of bars (bars 1-20 , as read compared against a threshold established for each parameter left to right) show results from a sequence which has a high ( e.g. , total number of reads , and / or reads of a component probability of being from samples which contained primar relative to the total number of reads) . ily polyadenylated RNA . The right set of bars (bars 21-40 , [ 0607 ] In some embodiments , a threshold is 20 million as read left to right) show results from a sequence which has total reads per protein coding region . In some embodiments, a high probability of being from samples which contained a threshold for the relative number of reads of a protein primarily total RNA . FIG . 14B shows poor results ( indicat coding region compared to the total number of reads in a ing , for example, possible contamination or degradation ) sample is 50 % or more . In some embodiments , the results from the analysis of two different sequences . The outlined are displayed to a user in a report ( e.g. , via a GUI) . box , tagged “ Bad , ” shows probability of the sequences as being from polyadenylated RNA about 50 % , indicating it is Example 13 : Biomarkers can be Used to Assess indeterminate that the sequences are uniform . Sequence Identity and / or Integrity [ 0603 ] As can be seen in FIG . 14A , the sequence data can [ 0608 ] Biomarkers can also be assessed to evaluate the be evaluated and a polyA status can be determined as either quality and / or identity of the sequence data . As shown in polyadenylated or total RNA in some embodiments . FIG . FIG . 15 , PCA can be performed to evaluate the expression 14B shows an example where the determination falls below of biomarkers . The results can be compared or trained a threshold of either polyadenylated or total RNA ( e.g. , 50 % against existing data sets of similar cohorts. The evaluation polyA , 50 % total RNA ). In this case , the sequence data can can be used to help validate an asserted information and / or be identified as inconsistent and / or of poor quality and may one or more additional sequence data sets . In some embodi signal a problem with the nucleic acid sample .Accordingly , ments , this will be useful to determine with increased in some embodiments the sequence data is identified as likelihood that the sequence information is from a given polyadenylated sequence data . In some embodiments, the source or subject. In contrast, inconsistency ( e.g. , if the sequence data is identified as total RNA sequence data . In evaluation does not match the asserted information and /or some embodiments, the threshold for identifying a sample as one or more additional sequence data sets ) may indicate that polyA is when the percent polyA RNA in a sample is above there is a potential quality issue related to the data that 50 % . In some embodiments , the threshold is 60 % . In some should be further investigated to identify the source of the embodiments, the threshold is 70 % . In some embodiments, inconsistency ( and / or that that the data should not be used the threshold is 80 % . In some embodiments , the threshold is for further analysis ). 90 % . In some embodiments, the threshold is 95 % . In some [ 0609 ] In some embodiments, the biomarker is for folli embodiments, the threshold is 96 % . In some embodiments, cular lymphoma. In some embodiments, the results are the threshold is 97 % . In some embodiments , the threshold is displayed to a user in a report ( e.g. , via a GUI ). 98 % . In some embodiments, the threshold is 99 % . In some embodiments, the results are displayed to a user in Example 14 : Non - Limiting Examples of Quality a report ( e.g. , via a GUI ) . Control Metrics for Assessing Sequence Data Identity and / or Integrity Example 11 : Exon Coverage can be Used to Assess Sequence Data Identity and / or Integrity [ 0610 ] In some embodiments, the disclosure relates to a method wherein at least one of the following additional [ 0604 ] Various techniques can be used for evaluating the features is determined : ( 1 ) mean quality score ; ( 2 ) contami consistency of data and / or to group data points for analysis nation value ; ( 3 ) GC content; ( 4 ) duplication level; ( 5 ) gene ( e.g. , in some embodiments principal component analysis body coverage ; and ( 6 ) per chromosome coverage . One or US 2021/0005284 A1 Jan. 7 , 2021 72 more of these determinations can be used to further assess embodiments, a quality threshold for further analysis the source or integrity of the sequence data by comparison includes greater than 10 million read counts ( e.g. , with a reference, or by comparison with at least one addi greater than 20 million read counts ), and / or a Phred tional sequence data set . score of greater than 25 ( e.g. , 28 or greater than 28 ) in [ 0611 ] In some embodiments, at least one additional fea more than 30 % of reads ( e.g. , in 50 % of reads or more ture is determined . In some embodiments, at least two than 50 % reads ). In some embodiments , a quality additional features are determined . In some embodiments , at control pipeline is stopped if the quality threshold is not least three additional features are determined . In some met . embodiments, at least four additional features are deter [ 0621 ] iii ) In some embodiments, sequence data is mined . In some embodiments , at least five additional fea screened against a library of sequences ( e.g. , using tures are determined . In some embodiments, at least six FastQ Screen , from Babraham Bioinformatics ), for additional features are determined . example to detect cross - species contamination ( e.g. , [ 0612 ] In some embodiments, evaluation of a concordance from other sources such as mouse , zebrafish , droso value of single nucleotide polymorphisms ( SNPs ) com phila , C. elegans, Saccharomyces, Arabidopsis, micro prises: ( a ) determining a concordance value of single biome , adapters, vectors , phiX , or other source ) . In nucleotide polymorphisms ( SNPs ) from the sequence data ; some embodiments, a quality control threshold for and ( b ) determining whether the concordance value of the further analysis based on cross - species contamination sequence data matches or exceeds a reference concordance is set at around 10 % , around 20 % , around 30 % , or value . In some embodiments , the reference concordance higher. For example, in some embodiments, a quality value is 80 % . control pipeline is stopped if sequence data comprises [ 0613 ] In some embodiments , evaluation of a contamina tion value comprises; ( a ) determining a contamination value 30 % or greater than 30 % contamination ( e.g. , with of the sequence data ; and ( b ) determining whether the bacterial sequence ). contamination value is less than a reference contamination [ 0622 ] iv ) In some embodiments, per chromosome cov value . In some embodiments, the reference contamination erage distribution and / or coverage distribution is deter value is 10 % . mined for one or more specific regions ( e.g. , one or [ 0614 ] In some embodiments, evaluation of a complexity more CCDS protein coding regions, exons , etc. ) using value comprises: ( a ) determining a complexity value of the an analytical tool ( e.g. , Mosdepth ). In some embodi sequence data; and ( b ) determining whether the complexity ments, a quality control threshold for further analysis value matches a reference complexity value . involves confirming that sequence data covers clini [ 0615 ] In some embodiments, evaluation of a Phred Score cally important genomic regions. In some embodi comprises : ( a ) determining a Phred Score of the sequence ments, a quality control pipeline is stopped if sequence data; and ( b ) determining whether the Phred Score matches coverage does not include one or more target genomic or exceeds a reference Phred Score . regions of interest. [ 0616 ] In some embodiments, evaluation of a GC content [ 0623 ] v ) In some embodiments , an analytical tool ( e.g. , comprises : ( a) determining a GC content of the sequence Picard ) is used to evaluate one or more sequence data data; and ( b ) determining if the GC content matches a parameters such as insert size , duplicates, mapping , reference GC content. pairing, or other parameter ( s ). [ 0617 ] In some embodiments , the methods further com [ 0624 ] vi ) In some embodiments, an analytical tool for prise generating a report to display the results of the at least evaluating RNA sequence data ( e.g. , RseQC as an one additional determination to a user ( e.g. , via a GUI) . example ) is used , for example , to determine insert size ( e.g. , inner distance between paired RNA reads ) , Example 15 : Non -Limiting Protocol for Assessing strandedness ( e.g. , to determine or confirm whether a Sequencing Data Quality Control stranded or non - stranded RNA sequence protocol was used ), and / or gene body coverage ( e.g. , to determine [ 0618 ] In some embodiments, a quality protocol for coverage bias , for example associated with an RNA sequence data ( e.g. , for WES and /or RNAseq data ) com extraction protocol, for example to distinguish polyA prises one or more of the following steps : versus total RNA sequence data ). [ 0619 ] i ) In some embodiments, low - quality reads ( for [ 0625 ] vii ) In some embodiments , a quality threshold example, based on positional information ) are for RNA analysis comprises determining the percent removed . In some embodiments, low -quality sequences age duplicates and / or adapter contamination and pro ( e.g. , reads from low - quality areas of a sequencing flow ceeding with further analysis for RNA sequence data cell ) are removed from sequence data ( e.g. , from a that has less than 70 % ( for example less than 60 % , or FASTQ file ). In some embodiments, if a significant less than 50 % ) duplicates and / or less than 25 % ( for fraction of the sequence reads are of low - quality ( e.g. , example less than 20 % , less than 15 % , or less than if bad tiles represent more than 30 % , more than 40 % , 10 % ) adapter contamination . Accordingly , an analysis or more than 50 % of a sequence data file ). protocol is terminated , in some embodiments, when [ 0620 ] ii ) In some embodiments, quality control tool for RNA sequence data that has more than 50 % ( e.g. , 60 % sequence data ( e.g. , FastQC as an example ) is used to or more than 60 % , or more than 70 % ) duplicates and / or countsevaluate ); qualityone or ofmore the ofsequencing library complexity platform ( e.g.( e.g. , ,based read more than 10 % ( e.g. , more than 15 % , 20 % or more than on a per base Phred quality score ); per tile quality 20 % , or more than 30 % ) adapter contamination . score ; per sequence GC content ( for example to detect [ 0626 ] viii ) In some embodiments, cross - individual contamination based on an unexpected GC content) , contamination is evaluated ( e.g. , using a concordance per base sequencing content ( e.g. , to detect adapter or and /or contamination estimator such as Conpair ) , for other contamination ); sequence duplication levels ( e.g. , example to determine the concordance of a pair of to evaluate the quality of RNA /DNA selection and / or samples ( e.g. , tumor and normal ) obtained from the PCR amplification ); and / or adapter content. In some same patient. In some embodiments, further analysis is US 2021/0005284 A1 Jan. 7 , 2021 73

performed if the normal and tumor samples ( e.g. , RNA comprises enriching the combined extracted RNA for normal and tumor DNA ) are identified as being from coding RNA . In some embodiments , the method further the same subject. comprises : extracting DNA from the second tumor sample; [ 0627 ] ix ) In some embodiments, a tumor - type classifier and combining the DNA extracted from the second tumor is used to predict a tumor type from gene expression sample with the DNA extracted from the first tumor sample data of a sample , and the predicted tumor type is to form combined extracted DNA , and preparing a second compared to the asserted tumor type ( e.g. , the tumor library of DNA fragments from the extracted DNA com type provided along with the nucleic acid data ). In prises preparing a library of DNA fragment from the com some embodiments, further analysis is performed if the bined extracted DNA . predicted and asserted tumor types match . [ 0637 ] In some embodiments, the method further com [ 0628 ] x ) In some embodiments, an RNA sequence type ses placing the first sample in a first cryogenic tube, the classifier is used to predict the library type from RNA first cryogenic tube comprising a composition that is able to sequence data ( e.g. , based on specific gene expression penetrate the sample and protect DNA and / or RNA therein levels or patterns ). In some embodiments, further analysis is performed if the predicted library type from degradation. In some embodiments, the method further matches an asserted library type for the sample being comprises snap freezing the contents of the first cryogenic analyzed . tube . [ 0629 ] xi ) In some embodiments, an MHC allele com [ 0638 ] In some embodiments, the method further com position is determined for two or more samples ( e.g. , prises placing the first sample of blood in a vacutainer from tumor and / or normal tissue ) from the same sub comprising an anticoagulant. In some embodiments, the ject . In some embodiments , further analysis is per method further comprises snap freezing the contents of the formed if the MHC allele compositions for the two or vacutainer. In some embodiments, the snap - frozen contents more samples match . of the cryogenic tube and / or vacutainer are stored for up to [ 0630 ] In some embodiments, one or more of the steps 7 months at -65 ° C. to -80 ° C. described above are performed . If sequence data ( e.g. , RNA [ 0639 ] In some embodiments, the first tumor sample is at and / or DNA sequence data ) fails to satisfy one or more of least 20 mg in weight, consist of at least 2x10 cells , or these quality control steps , the sequence data can be provides at least 1 ug of RNA upon RNA extraction . excluded from further analysis. In some embodiments, addi [ 0640 ] In some embodiments, the method further com tional sequence data can be obtained for a subject for which prises: forming a single - cell suspension of cells from the an initial set of sequence data did not satisfy one or more first sample of tumor ; and performing mass cytometry on at quality control criteria . least a first part of the single - cell suspension , the at least the first part of the single - cell suspension comprising at least Example Embodiments 5x10 cells . [ 0631 ] Some embodiments provide for a method compris [ 0641 ] In some embodiments, the method further com ing : obtaining a first sample of a first tumor from a subject prises forming a lysate from at least a second part of the having, suspected of having, or at risk of having cancer ; single - cell suspension , the at least the second part of the extracting RNA from the first sample of the first tumor ; single - cell suspension comprising at least 2x106 cells ; enriching the RNA for coding RNA to obtain enriched RNA ; extracting RNA from the lysate; performing RNA sequenc preparing a first library of DNA fragments from the enriched ing on the extracted RNA to obtain RNA expression data ; RNA for non - stranded RNA sequencing; and performing and / or determining whether the first tumor is heterogeneous non - stranded RNA sequencing on the first library of DNA based on the RNA expression data . fragments prepared from the enriched RNA . [ 0642 ] In some embodiments, forming the single - cell sus [ 0632 ] In some embodiments, the method further com pension of cells comprises: dissecting the first tumor sample prises extracting DNA from the first sample of the tumor; to obtain tumor sample fragments; incubating the tumor preparing a second library of DNA fragments from the sample fragments in an enzyme cocktail, the enzyme cock extracted DNA ; and performing whole exome sequencing tail comprising penicillin and / or streptomycin, collagenase ( WES ) on the second library of DNA fragments . I , and collagenase IV ; and filtering the enzyme cocktail [ 0633 ] In some embodiments, the method further com through a 70 um cell strainer. prises: obtaining a first sample of blood from the subject; [ 0643 ] In some embodiments, the first sample of blood is extracting DNA from the first sample of blood ; preparing a at least 0.5-1.0 ml in volume . third library of DNA fragments from the DNA extracted [ 0644 ] In some embodiments , the RNA extracted from from the first sample of blood ; and performing whole exome either the first sample or the second sample is at least sequencing ( WES ) on the third library of DNA . 1000-6000 ng in total mass , has a purity corresponding to a [ 0634 ] In some embodiments, the method further com prises obtaining a second sample of a second tumor from the ratio of absorbance at 260 nm to absorbance at 280 nm of at subject. In some embodiments, the first tumor and the least 2.0 . second tumor are a same tumor . In some embodiments, the [ 0645 ] In some embodiments , the DNA extracted from the first and second tumors are different tumors . first sample is at least 1000-2000 ng in total mass , in at least [ 0635 ] In some embodiments , the method further com 10 ul of solution , of a concentration of 100-200 ng / ul , and prises combining the first and second tumor samples to form has a purity corresponding to a ratio of absorbance at 260 nm a combined tumor sample, and extracting the RNA com to absorbance at 280 nm of at least 1.8 . prises extracting the RNA from the combined tumor sample . [ 0646 ] In some embodiments, enriching the RNA for [ 0636 ] In some embodiments, the method further com coding RNA comprises performing polyA enrichment. prises: extracting RNA from the second sample ; and com [ 0647 ] In some embodiments , the WES performed on the bining the RNA extracted from the second sample with the second library of DNA fragments , and the WES performed RNA extracted from the first sample to form combined on the third library of DNA fragments have at least 100 bp extracted RNA , and wherein enriching the RNA for coding paired -end reads, and an estimated coverage of at least 100x . US 2021/0005284 A1 Jan. 7 , 2021 74

[ 0648 ] In some embodiments, the WES has an estimated bias - corrected gene expression data ; and identifying a can coverage of at least 150x . cer treatment for the subject using the bias - corrected gene expression data . [ 0649 ] In some embodiments , the RNA sequencing on the [ 0657 ] Some embodiments provide for a method for pro first library of DNA fragments has at least 100 bp paired - end cessing RNA expression data , the method comprising using reads , and an estimated total number of reads of at least 50 at least one computer hardware processor to perform : million paired - end reads. obtaining RNA expression data for a subject having, sus [ 0650 ] In some embodiments, the RNA sequencing on the pected of having, or at risk of having cancer ; aligning and first library of DNA fragments at least 100 bp paired - end annotating genes in the RNA expression data with known reads , and an estimated total number of reads of at least 100 sequences of the human genome to obtain annotated RNA million paired - end reads . expression data ; removing non -coding transcripts from the [ 0651 ] In some embodiments , the method further com annotated RNA expression data ; converting the annotated prises subjecting a sample of any one of the prepared RNA expression data to gene expression data in transcripts libraries of DNA fragments to quality control tests to evalu per kilobase million ( TPM ) ; identifying at least one gene ate their integrity and / or peak size , wherein each sample of that introduces bias in the gene expression data ; removing the prepared libraries comprises up to 1 ng of a library . the at least one gene from the gene expression data to obtain bias - corrected gene expression data ; and identifying a can [ 0652 ] In some embodiments, the subject is human . cer treatment for the subject using the bias - corrected gene [ 0653 ] Some embodiments provide for a kit, comprising: expression data . a composition that is able to penetrate tissue and protect [ 0658 ] In some embodiments, identifying the at least one DNA and / or RNA therein from degradation ; at least one tool gene from the gene expression data comprises identifying at for dissecting a sample of tumor and preparing a single -cell least one gene having an average transcript length at least a suspension therefrom ; at least one reagent for snap - freezing threshold amount higher or lower than an average length of of biological samples; an anticoagulant; at least one vacu transcripts in the gene expression data . tainer, at least one reagent for extracting DNA and RNA [ 0659 ] In some embodiments, identifying the at least one from tissue samples and blood ; and at least one reagent for gene from the gene expression data comprises identifying at preparing DNA libraries from DNA and / or RNA samples. least one gene having at least a threshold variation in [ 0654 ] Some embodiments provide for a kit for use in a average transcript expression level based on transcript method according to any of the preceding examples . expression levels in reference samples. [ 0655 ] Some embodiments provide for a system compris [ 0660 ] In some embodiments, identifying the at least one ing at least one computer hardware processor and at least gene from gene expression data comprises identifying one or one non - transitory computer - readable storage medium stor more genes having a polyA tail that is at least a threshold ing processor - executable instructions that, when executed amount smaller in length compared to an average length of by the at least one computer hardware processor, cause the polyA tails of genes from a sample from which the RNA at least one computer hardware processor to perform a expression data was obtained . method for processing RNA expression data . The method [ 0661 ] In some embodiments, the at least one gene comprises using at least one computer hardware processor to belongs to a family of genes selected from the group perform : obtaining RNA expression data for a subject hav consisting of: histone - encoding genes , mitochondrial genes , ing , suspected of having , or at risk of having cancer ; aligning interleukin - encoding genes , collagen - encoding genes , and annotating genes in the RNA expression data with B - cell receptor - encoding genes, and T cell receptor- encod known sequences of the human genome to obtain annotated ing genes. RNA expression data ; removing non - coding transcripts from [ 0662 ] In some embodiments, the at least one gene com the annotated RNA expression data; converting the anno prises at least one histone - encoding gene selected from the tated RNA expression data to gene expression data in group consisting of: HIST1H1A , HIST1H1B , HISTIHIC , transcripts per kilobase million ( TPM ) ; identifying at least HIST1HID , HISTIHIE , HIST1H1T, HIST1H2AA , one gene that introduces bias in the gene expression data ; HIST1H2AB , HIST1H2AC , HIST1H2AD , HIST1H2AE , removing the at least one gene from the gene expression data HIST1H2AG , HIST1H2AH , HIST1H2AI , HIST1H2AJ , to obtain bias - corrected gene expression data ; and identify HIST1H2AK , HIST1H2AL , HIST1H2AM , HIST1H2BA , ing a cancer treatment for the subject using the bias HIST1H2BB , HIST1H2BC , HIST1H2BD , HIST1H2BE , corrected gene expression data . HIST1H2BF , HIST1H2BG , HIST1H2BH , HIST1H2BI , [ 0656 ] Some embodiments provide for at least one non HIST1H2BJ , HIST1H2BK , HIST1H2BL , HIST1H2BM , transitory computer - readable storage medium storing pro HIST1H2BN , HIST1H2BO , HIST1H3A , HIST1H3B , cessor - executable instructions that, when executed by at HIST1??? , HIST1H3D , HISTI??? , HIST1H3F, , least one computer hardware processor, cause the at least HIST1H3G , HIST1H3H , HIST1H31 , HIST1H3J , one computer hardware processor to perform a method for HIST1H4A , HIST1H4B , HIST1H4C , HIST1H4D , processing RNA expression data . The method comprises HIST1H4E , HIST1H4F, HIST1H4G , HIST1H4H , using at least one computer hardware processor to perform : HIST1H41 , HIST1H4J , HIST1H4K , HIST1H4L , obtaining RNA expression data for a subject having, sus HIST2H2AA3, HIST2H2AA4, HIST2H2AB , pected of having, or at risk of having cancer , aligning and HIST2H2AC , HIST2H2BE , HIST2H2BF , HIST2H3A , annotating genes in the RNA expression data with known HIST2H3C , HIST2H3D , HIST2H3PS2 , HIST2H4A , HIS sequences of the human genome to obtain annotated RNA T2H4B , HIST3H2A , HIST3H2BB , HIST3H3 , and expression data ; removing non - coding transcripts from the HIST4H4 . annotated RNA expression data ; converting the annotated [ 0663 ] In some embodiments, the at least one gene com RNA expression data to gene expression data in transcripts prises at least one mitochondrial gene selected from the per kilobase million ( TPM ) ; identifying at least one gene group consisting of: MT- ATP6 , MT - ATP8 , MT -C01 , MT that introduces bias in the gene expression data; removing CO2 , MT - CO3, MT -CYB , MT- ND1 , MT - ND2, MT -ND3 , the at least one gene from the gene expression data to obtain MT- ND4 , MT- ND4L , MT- ND5, MT -ND6 , MT- RNR1, MT US 2021/0005284 A1 Jan. 7 , 2021 75

RNR2, MT- TA , MT- TC , MT - TD , MT - TE , MT- TF, MT - TG , expression data associated with the at least one gene to MT - TH , MT - TI, MT - TK , MT- TL1, MT - TL2 , MT- TM , MT obtain bias - corrected gene expression data ; and identifying TN , MT - TP , MT - TQ , MT- TR , MT - TS1, MT - TS2 , MT - TT , a therapy for the subject using the bias - corrected gene MT - TV , MT - TW , MT- TY, MTRNR2L1, MTRNR2L10 , expression data . In some embodiments, the method further MTRNR2L11 , MTRNR2L12 , MTRNR2L13 , MTRNR2L3 , comprises administering to the subject the identified therapy. MTRNR2L4, MTRNR2L5, MTRNR2L6 , MTRNR2L7, and [ 0672 ] In some embodiments, identifying the therapy for MTRNR2L8 . the subject using the bias - corrected gene expression data [ 0664 ] In some embodiments , the RNA expression data is comprises: determining, using the bias - corrected gene characterized by at least 100 bp paired - end reads, and an expression data, a plurality of gene group expression levels estimated coverage of at least 50 million paired - end reads . comprising a gene group expression level for each gene [ 0665 ] In some embodiments, the RNA expression data is group in a set of gene groups , wherein the set of gene groups characterized by at least 100 bp paired - end reads, and an comprises at least one gene group associated with cancer estimated total number of reads of at least 100 million malignancy, and at least one gene group associated with paired - end reads. cancer microenvironment; and identifying the therapy using [ 0666 ] In some embodiments, aligning genes in the RNA the determined plurality of gene group expression levels. expression data is performed using a GRCh38 genome [ 0673 ] In some embodiments , identifying the at least one assembly. gene that introduces bias in the gene expression data com [ 0667 ] In some embodiments, annotating the genes in the prises identifying at least one gene having an average RNA expression data is based on GENCODE V23 compre transcript length at least a threshold amount higher or lower hensive annotation (www.gencodegenes.org ). than an average length of transcripts in the gene expression [ 0668 ] In some embodiments, the removed non - coding data . transcripts belong to groups selected from the list consisting [ 0674 ] In some embodiments, identifying the at least one of: pseudogenes, polymorphic pseudogenes, processed gene that introduces bias in the gene expression data com pseudogenes, transcribed processed pseudogenes, unitary prises identifying at least one gene having at least a thresh pseudogenes, unprocessed pseudogenes, transcribed unitary old variation in average transcript expression level based on pseudogenes, constant chain immunoglobulin ( IG C ) transcript expression levels in reference samples. pseudogenes, joining chain immunoglobulin ( IG J ) pseudo [ 0675 ] In some embodiments , identifying the at least one genes, variable chain immunoglobulin ( IG V ) pseudogenes, gene comprises identifying one or more genes having a transcribed unprocessed pseudogenes, translated unpro polyA tail that is at least a threshold amount smaller in cessed pseudogenes , joining chain T cell receptor ( TR J ) length compared to an average length of polyA tails of genes pseudogenes, variable chain T cell receptor ( TR V ) pseudo from a sample from which the RNA expression data was genes , small nuclear RNAs ( snRNA ), small nucleolar RNAs obtained . ( snoRNA ), microRNAs (miRNA ), ribozymes, ribosomal [ 0676 ] In some embodiments, the at least one gene RNA ( TRNA ), mitochondrial tRNAs ( Mt tRNA ), mitochon belongs to a family of genes selected from the group drial rRNAs ( Mt rRNA ), small Cajal body - specific RNAs consisting of : histone -encoding genes, mitochondrial genes, ( scaRNA ), retained introns, sense intronic RNA , sense over interleukin - encoding genes, collagen - encoding genes , lapping RNA , nonsense -mediated decay RNA , non - stop B - cell receptor -encoding genes, and T cell receptor- encod decay RNA , antisense RNA , long intervening noncoding ing genes. RNAs ( lincRNA ), macro long non - coding RNA ( macro [ 0677 ] In some embodiments , the at least one gene com IncRNA ), processed transcripts, 3prime overlapping non prises at least one histone - encoding gene selected from the coding RNA ( 3prime overlapping ncrna ), small RNAs group consisting of : HIST1H1A , HISTIH1B , HISTIHIC , ( sRNA ), miscellaneous RNA (misc RNA ), vault RNA ( vaul HISTIHID HIST1HIE, HISTIHIT, HIST1H2AA , TRNA ), and TEC RNA . HIST1H2AB , HIST1H2AC , HIST1H2AD , HIST1H2AE , [ 0669 ] In some embodiments , the RNA expression data HIST1H2AG , HIST1H2AH , HIST1H2AI , HIST1H2AJ , has been obtained by performing RNA sequencing on one or HIST1H2AK , HIST1H2AL , HIST1H2AM , HIST1H2BA , more samples of a subject's tumor . HIST1H2BB , HIST1H2BC , HIST1H2BD , HIST1H2BE , [ 0670 ] In some embodiments , identifying the cancer treat HIST1H2BF , HIST1H2BG , HIST1H2BH , HIST1H2BI , ment for the subject using the bias -corrected gene expres HIST1H2BJ , HIST1H2BK , HIST1H2BL , HIST1H2BM , sion data comprises: determining, using the bias - corrected HIST1H2BN , HIST1H2BO , HIST1H3A , HIST1??? , gene expression data , a gene group expression level for each HIST1H3C ,, HIST1H3D , HIST1??? , HIST1??? ,, gene group in a set of gene groups, wherein the set of gene HIST1H3G , HIST1H3H , HIST1H31 , HIST1H3J , group comprises at least one gene group associated with HIST1H4A , HIST1H4B , HIST1H4C , HIST1H4D , cancer malignancy, and at least one gene group associated HIST1H4E , HIST1H4F , HIST1H4G , HIST1H4H , with cancer microenvironment; and identifying the cancer HIST1H41 , HIST1H4J , HIST1H4K , HIST1H4L , treatment using the determined gene group expression lev HIST2H2AA3, HIST2H2AA4 , HIST2H2AB , els . In some embodiments , the method further comprises: HIST2H2AC , HIST2H2BE , HIST2H2BF , HIST2H3A , administering the cancer treatment to the subject. HIST2H3C , HIST2H3D, HIST2H3PS2 , HIST2H4A , [ 0671 ] Some embodiments provide for a method compris HIST2H4B , HIST3H2A , HIST3H2BB , HIST3H3 , and ing : enriching RNA for coding RNA in a sample of extracted HIST4H4 . RNA from a first tumor sample from a subject having, [ 0678 ] In some embodiments , the at least one gene com suspected of having, or at risk of having cancer ; performing prises at least one mitochondrial gene selected from the non - stranded RNA sequencing on a first library of cDNA group consisting of: MT - ATP6 , MT- ATP8 , MT- C01, MT fragments prepared from the enriched RNA to obtain RNA CO2 , MT -CO3 , MT- CYB , MT- ND1 , MT -ND2 , MT -ND3 , expression data; converting the RNA expression data to gene MT -ND4 , MT- ND4L , MT- ND5 , MT -NDO , MT - RNR1, MT expression data in transcripts per kilobase million ( TPM ) ; RNR2 , MT- TA , MT - TC , MT- TD , MT - TE , MT - TF , MT- TG , identifying at least one gene that introduces bias in the gene MT- TH , MT- TI, MT- TK , MT - TL1, MT- TL2 , MT- TM , MT expression data ; removing, from the gene expression data , TN , MT- TP , MT- TQ , MT - TR , MT- TS1, MT- TS2 , MT - TT, US 2021/0005284 A1 Jan. 7 , 2021 76

MT - TV , MT- TW , MT- TY, MTRNR2L1, MTRNR2L10 , [ 0687 ] In some embodiments, enriching the RNA for MTRNR2L11 , MTRNR2L12 , MTRNR2L13 , MTRNR2L3, coding RNA comprises performing polyA enrichment. MTRNR2L4, MTRNR2L5, MTRNR2L6 , MTRNR2L7, and MTRNR2L8 . [ 0688 ] Some embodiments provide for a system , compris [ 0679 ] In some embodiments, the RNA expression data is ing : at least one computer hardware processor ; and at least characterized by at least 100 bp paired - end reads, and an one non - transitory computer - readable storage medium stor estimated read depth of at least 50 million paired - end reads . ing processor -executable instructions that, when executed [ 0680 ] In some embodiments , the method further com by the at least one computer hardware processor, cause the prises: aligning and annotating genes in the RNA expression at least one computer hardware processor to perform : a data with known sequences of the human genome to obtain method , comprising: ( a ) obtaining nucleic acid data com annotated RNA expression data before identifying at least prising: ( i ) sequence data comprising at least 5 kilobases one gene that introduces bias in the gene expression data , ( kb ) of DNA and / or RNA , the sequence data obtained by aligning genes in the RNA expression data is performed sequencing a biological sample of a subject having, sus using a GRCh38 genome assembly, and annotating the genes pected of having, or at risk of having a disease ; and ( ii ) in the RNA expression data is performed using a GEN asserted information indicating an asserted source and/ or an CODE V23 comprehensive annotation (www.gencodegene asserted integrity of the sequence data ; and ( b ) validating the s.org ) nucleic acid data by : ( i ) processing the sequence data to [ 0681 ] In some embodiments , the method further com obtain determined information indicating a determined prises: removing non - coding transcripts from the RNA source and / or a determined integrity of the sequence data ; expression data , wherein the removed non - coding tran and ( ii ) determining whether the determined information scripts belong to groups selected from the list consisting of: matches the asserted information . pseudogenes, polymorphic pseudogenes , processed pseudo [ 0689 ] Some embodiments provide for at least one non genes, transcribed processed pseudogenes, unitary pseudo transitory computer -readable storage medium storing pro genes, unprocessed pseudogenes, transcribed unitary cessor - executable instructions that, when executed by at pseudogenes, constant chain immunoglobulin ( IG C ) least one computer hardware processor, cause the at least pseudogenes, joining chain immunoglobulin ( IG I ) pseudo one computer hardware processor to perform : a method , genes , variable chain immunoglobulin ( IG V ) pseudogenes , comprising : ( a ) obtaining nucleic acid data comprising : ( i ) transcribed unprocessed pseudogenes, translated unpro sequence data comprising at least 5 kilobases ( kb ) of DNA cessed pseudogenes , joining chain T cell receptor ( TR J ) and / or RNA , the sequence data obtained by sequencing a pseudogenes, variable chain T cell receptor ( TR V ) pseudo biological sample of a subject having, suspected of having, genes, small nuclear RNAs ( snRNA ), small nucleolar RNAs or at risk of having a disease ; and ( ii ) asserted information ( snoRNA ), microRNAs (miRNA ), ribozymes, ribosomal indicating an asserted source and / or an asserted integrity of RNA ( TRNA ), mitochondrial tRNAs ( Mt tRNA ), mitochon the sequence data ; and ( b ) validating the nucleic acid data drial rRNAs ( Mt rRNA ), small Cajal body - specific RNAs by : ( i ) processing the sequence data to obtain determined ( scaRNA ), retained introns, sense intronic RNA , sense over information indicating a determined source and / or a deter lapping RNA , nonsense -mediated decay RNA , non - stop mined integrity of the sequence data ; and ( ii ) determining decay RNA , antisense RNA , long intervening noncoding whether the determined information matches the asserted RNAs ( lincRNA ), macro long non - coding RNA ( macro information . IncRNA ), processed transcripts , 3prime overlapping non [ 0690 ] Some embodiments provide for a method , com coding RNA ( 3prime overlapping ncrna ), small RNAs prising: ( a ) obtaining nucleic acid data comprising : ( i ) ( sRNA ), miscellaneous RNA (misc RNA ), vault RNA ( vaul sequence data comprising at least 5 kilobases ( kb ) of DNA tRNA ), and TEC RNA . and / or RNA , the sequence data obtained by sequencing a [ 0682 ] In some embodiments , the method further com biological sample of a subject having, suspected of having, prises: obtaining a first sample of a first tumor from a subject or at risk of having a disease ; and ( ii ) asserted information having or suspected of having cancer, and extracting RNA indicating an asserted source and /or an asserted integrity of from the first sample of the first tumor to obtain the sample the sequence data ; and ( b ) validating the nucleic acid data of extracted RNA ; before enriching the RNA for coding by : ( i ) processing the sequence data to obtain determined RNA . In some embodiments, the method further comprises information indicating a determined source and / or a deter obtaining a second sample of a second tumor from the mined integrity of the sequence data ; and ( ii ) determining subject whether the determined information matches the asserted [ 0683 ] In some embodiments , the method further com information . prises: combining the first and second samples to form a [ 0691 ] In some embodiments, when it is determined that combined tumor sample, and extracting the RNA comprises the asserted information matches the determined informa extracting the RNA from the combined tumor sample. tion : ( i ) accessing a database of disease features; and ( ii ) [ 0684 ] In some embodiments, the method further com processing the sequence data to determine whether it is prises: extracting RNA from the second sample ; combining indicative of one or more of the disease features ; and ( d ) the RNA extracted from the second sample with the RNA when it is determined that the asserted information does not extracted from the first sample to form combined extracted match the determined information : ( i ) indicating to a user RNA , and enriching the RNA for coding RNA comprises that the determined and asserted information do not match ; enriching the combined extracted RNA for coding RNA . ( ii ) excluding the sequence data from further analysis ; [ 0685 ] In some embodiments, the sample of extracted and / or ( iii ) obtaining additional sequence data and / or other RNA comprises at least 1 ug of RNA upon RNA extraction . information about the biological sample and /or the subject. [ 0686 ] In some embodiments, the extracted RNA is at [ 0692 ] In some embodiments, the asserted information for least 1000-6000 ng in total mass , has a purity corresponding the sequence data is based on ( one , at least two, at least to a ratio of absorbance at 260 nm to absorbance at 280 nm three , between 2 and 10 , between 5 and 10 pieces of) of at least 2.0 . information selected from the group consisting of: MHC US 2021/0005284 A1 Jan. 7 , 2021 77 allele sequence information ; nucleic acid type ; subject iden [ 0705 ] In some embodiments, the method further com tity ; sample identity ; tissue type from which the sample was prises generating a report that indicates an extent of a match obtained ; between one or more features that are determined from the tumor type from which the sample was obtained ; sequencing sequence data and one or more corresponding asserted platform used to generate the sequence data ; sequence features in the asserted information . integrity; polyA status of an RNA sample ( e.g. , indicating whether the RNA sample was polyA enriched ); total EQUIVALENTS AND SCOPE sequence coverage ; exon coverage ; chromosomal coverage ; ratio of expression levels of nucleic acids encoding two or [ 0706 ] All of the features described in this specification more subunits of the same protein; contamination ; single may be combined in any combination . Each feature nucleotide polymorphisms ( SNPs ) ; complexity ; and / or gua described in this specification may be replaced by an alter nine ( G ) and cytosine ( C ) percentage ( % ). native feature serving the same , equivalent, or similar pur [ 0693 ] In some embodiments , the determined information pose . Thus , unless expressly stated otherwise , each feature for the sequence data is based on ( one , at least two, at least described is only an example of a generic series of equiva three , between 2 and 10 , between 5 and 10 uleces of) lent or similar features . information selected from the group consisting of : MHC [ 0707 ] All of the features described in this specification allele sequence information; nucleic acid type ; subject iden may be combined in any combination . Each feature tity ; sample identity ; tissue type from which the sample was described in this specification may be replaced by an alter obtained ; tumor type from which the sample was obtained ; native feature serving the same , equivalent, or similar pur sequencing platform used to generate the sequence data ; pose . Thus, unless expressly stated otherwise , each feature sequence integrity ; polyA status of an RNA sample ( e.g. , described is only an example of a generic series of equiva indicating whether the RNA sample was polyA enriched ); lent or similar features . total sequence coverage ; exon coverage ; chromosomal cov [ 0708 ] From the above description, one skilled in the art erage ; ratio of expression levels of nucleic acids encoding can easily ascertain the essential characteristics of the pres two or more subunits of the same protein ; contamination ; ent disclosure, and without departing from the spirit and single nucleotide polymorphisms ( SNPs ) ; complexity ; and / scope thereof, can make various changes and modifications or guanine ( G ) and cytosine ( C ) percentage ( % ) . of the disclosure to adapt it to various usages and conditions . [ 0694 ] In some embodiments , the disease is cancer. In Thus, other embodiments are also within the claims . some embodiments, the subject is human . [ 0709 ] The terms “ program ” or “ software ” are used herein [ 0695 ] In some embodiments, the source of the sequence in a generic sense to refer to any type of computer code or data is a subject, a tissue type , a tumor type , an RNA set of processor - executable instructions that can be sequence type , or a DNA sequence type . employed to program a computer or other processor (physi cal or virtual) to implement various aspects of embodiments [ 0696 ] In some embodiments, the subject from which the as described above . Additionally, according to one aspect , sequence data is obtained is evaluated by determining one or one or more computer programs that when executed perform more MHC sequences, for example, by determining MHC methods of the technology described herein need not reside sequences for six MHC loci . on a single computer or processor, but may be distributed in [ 0697 ] In some embodiments, the source of one or more a modular fashion among different computers or processors nucleic acid sequence data sets is evaluated by determining to implement various aspects of the technology described a SNP concordance for the nucleic acid sequence data sets . herein . [ 0698 ] In some embodiments, the integrity of the sequence [ 0710 ) Processor - executable instructions may be in many data is evaluated by determining exon coverage, one or more forms, such as program modules, executed by one or more ratios of protein subunit encoding nucleic acids , and / or gene computers or other devices . Generally, program modules coverage of the sequence data . include routines , programs, objects , components, data struc [ 0699 ] In some embodiments, the integrity of RNA tures, etc. that performs particular tasks or implement par sequence data is evaluated by determining coverage of one ticular abstract data types. Typically , the functionality of the or more genes in the RNA sequence data . program modules may be combined or distributed . [ 0700 ] In some embodiments, the integrity of RNA [ 0711 ] Also , data structures may be stored in one or more sequence data is evaluated by determining a relative cover non - transitory computer - readable storage media in any suit age of two or more exons for at least one gene in the RNA able form . For simplicity of illustration, data structures may sequence data . be shown to have fields that are related through location in [ 0701 ] In some embodiments , the integrity of RNA the data structure . Such relationships may likewise be sequence data is evaluated by determining an expression achieved by assigning storage for the fields with locations in ratio of two known reference genes in the RNA sequence a non - transitory computer -readable medium that convey data . relationship between the fields. However, any suitable [ 0702 ] In some embodiments , the method further com mechanism may be used to establish relationships among prises determining a level of nucleic acid degradation , information in fields of a data structure , including through contamination , and / or GC content . the use of pointers, tags or other mechanisms that establish [ 0703 ] In some embodiments, determining whether RNA relationships among data elements . sequence data is polyA RNA sequence data or total RNA [ 0712 ] Various inventive concepts may be embodied as sequence data comprises determining the expression level of one or more processes, of which examples have been one or more mitochondrial and / or histone genes in the RNA provided . The acts performed as part of each process may be sequence data . ordered in any suitable way. Thus , embodiments may be [ 0704 ] In some embodiments, the sequencing platform constructed in which acts are performed in an order different that was used for generating WES sequence data is identified than illustrated , which may include performing some acts by determining a percent ( % ) variance for one or more simultaneously, even though shown as sequential acts in reference genes in the WES sequence data . illustrative embodiments .