US 201503 15643A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2015/0315643 A1 O'Garra et al. (43) Pub. Date: Nov. 5, 2015

(54) BLOODTRANSCRIPTIONAL SIGNATURES Publication Classification OF ACTIVE PULMONARY TUBERCULOSS AND SARCODOSIS (51) Int. C. CI2O I/68 (2006.01) (71) Applicants: BAYLOR RESEARCH INSTITUTE, (52) U.S. C. Dallas, TX (US); MEDICAL CPC ...... CI2O I/6883 (2013.01): CI2O I/6886 RESEARCH COUNCIL, Swindon (2013.01); C12O 2600/158 (2013.01); C12O (GB); IMPERIAL COLLEGE 2600/16 (2013.01); C12O 2600/1 18 (2013.01); HEALTHCARE NHS TRUST, London C12O 2600/1 12 (2013.01); C12O 2600/106 (GB) (2013.01) (72) Inventors: Anne O'Garra, London (GB); Chloe (57) ABSTRACT Bloom, London (GB); Matthew Paul The present invention includes a method of determining a Reddoch Berry, London (GB); Jacques lung disease from a patient Suspected of sarcoidosis, tuber F. Banchereau, Montclair, NJ (US); culosis, lung cancer or pneumonia comprising: obtaining a Damien Chaussabel, Bainbridge Island, sample from whole blood of the patient suspected of sarcoi WA (US); Viginia Maria Pascual, dosis, tuberculosis, lung cancer or pneumonia; detecting Dallas, TX (US) expression of (although not exclusive) six or more disease (21) Appl. No.: 14/651,989 , markers, or probes selected from SEQID NOS.: 1 to 1446, wherein increased expression of mRNA of upregulated (22) PCT Fled: Dec. 13, 2013 sarcoidosis, tuberculosis, lung cancer and pneumonia mark ers of SEQID NOS.: 1 to 1446 and/or decreased expression (86) PCT NO.: PCT/US2013/075097 of mRNA of downregulated Sarcoidosis, tuberculosis, lung cancer or pneumonia markers of SEQ ID NOS.: 1 to 1446 S371 (c)(1), relative to the expression of the mRNAs from a normal (2) Date: Jun. 12, 2015 sample; and determining the lung disease based on the expression level of the six or more disease markers of SEQID Related U.S. Application Data NOS.: 1 to 1446 based on a comparison of the expression (60) Provisional application No. 61/736,908, filed on Dec. level of sarcoidosis, tuberculosis, lung cancer, and pneumo 13, 2012. 18. Patent Application Publication Nov. 5, 2015 Sheet 1 of 21 US 2015/0315643 A1

SETTWACJEZIT\/WHON

STOHINOO| WINOWñENd£ 8|| SISOGIOOHWS© NEONVOONQTNo.. Patent Application Publication Nov. 5, 2015 Sheet 2 of 21 US 2015/0315643 A1

Patent Application Publication US 2015/0315643 A1

STOHINOO||. NEONVOONQTNo.. WINOWIEN?? SISOGIOOHWS©

: S. S S &

&S Patent Application Publication Nov. 5, 2015 Sheet 4 of 21 US 2015/0315643 A1

SISOGIOOHWSENILOW? SISOO||OOHWSHALLOW-NON }|E|ONVO9NITT Patent Application Publication Nov. 5, 2015 Sheet 5 of 21 US 2015/0315643 A1

p40.001 p<0.001 p<0.001 p<0.01 p<0.001 p40.05 1000 p40.05 p<0.001 800 WEIGHTED MOLECULAR 600 DISTANCE 400 TO HEALTH 200

CONTROL ACTIVE PNEUMONIA SARCOD TB NON-ACTIVE CANCER SARCOD FIG. 3B

FIG. 4A-1 FIG. 4A-2 FIG. 6A-1 FIG. 6A-2

FIG. 4 FIG. 6 Patent Application Publication Nov. 5, 2015 Sheet 6 of 21 US 2015/0315643 A1

FIG. 4A-1 O&O G (900 G | O Q O OG G eos ooooooooo doooooo oo Ooooooel (33 () (366) (3 OOC) OG E)(3G) () G | O (3G) (C) () (3G) (3)O 3 (3 e G () () O (GSO) () ()(36 () (36) () (OOG) (3G) oooooood (c)(3) () ()

(336:33Ooooooooooooooo (333 (36)(3)(363.36 (3 (33)(3)(33G)3&63()eeeeoo pool (33333opio &o OOOO OO o OOOoel O OOO OO OO koo o (3G) () ()() () ()() () O (E) () () ()() () Q oo o ode (E) () O OO e Oc (33 CE) 2 (3 (3 (3) O o d (3G) H O3 (33 (3)(3 (3G) () ()(3) (3) ()() " () () ()() () (3) ()

eO & Ge e o Q G GQ e o oo oo od 363 () (3) GE)

() () () () () O GE) o oooooo() ()() (3) odo oo o o () GE) (SG) () ()() & () () ()soo o() ()()so e() o lo o o do() NE TB ACTIVE SARCODOSIS NON-ACTIVE SARCOD Patent Application Publication Nov. 5, 2015 Sheet 7 of 21 US 2015/0315643 A1

FIG. 4A-2 3. Q() () M4.10 B-CELLS ke oooo eke oooooo M41M9.2 B-CELLSTCELLs 3O6) M5.15 NEUTROPHILS M4.11 PLASMA CELLS M4.14 MONOCYTES pooe GE M11 PLATELETS 6 M2.3 ERYTHROCYTES OG 6.) M3.1 ERYTHROCYTES 38G)6(3)() (33 QQ M36 CYTOTOXCINK M7.21 CYTOTOXC O M12 F INTERFERON M34 IFN MODULES M5.12 LIFN (3) Q M7.18 IMMUNOSUPRESSION G) M8.61 IMMUNE RESPONSES eO M8.89 IMMUNE RESPONSES b M934 MMUNOSUPRESSION 3.33(3) (3)(33 (33 M3.2 |NFLAMMATION NFLAMMATION M42 INFLAMMATION 4 MODULES a M4.13 NFLAMMATION E M5.1 INFLAMMATION M722 NFLAMMATION () () ( M7.24 LYMPHOCYTEACTIVATION M2.1 CELL CYCLE M22 CELL CYCLE M3.3 CELL CYCLE 6: OO M3.5 CELL CYCLE M6.11 CELL CYCLE M6.13 CELL DEATH M6.16 CELL CYCLE () d M76 PROLIFERATION O M77 PROLIFERATION M78 CELL CYCLE (3)(E)(3G)3 M7.16 DCIAPOPTOSIS 6 O M5.10 MIT RESPIRATION M6.2 MITOCHONDRIAL RESPIRATION

k d O M6.12 MITOCHONDRIAL STRESS

====/vvy u/ O OVEREXPRESSED PNEUMONIA LUNG CANCER (3 & 8 (3) Q) O UNDEREXPRESSED 100 908070 60504030 20 PERCENTAGE PROBE (P<0.05) Patent Application Publication Nov. 5, 2015 Sheet 8 of 21 US 2015/0315643 A1

p<0.001 p40.001 p<0.001 p<0.001 p<0.05 p<0.001 p40.001 pCO.001 p40.001 pCO.001 cuo p40.001 p40.001 ly 50 4 s: 40 HS i? 3: 3 O cod 30 E R 5 2 e c 20 e f : 10 f : 1 5 O O NON-ACTIVE CANCER NON-ACTIVE CANCER SARCOD SARCOID ACTIVE PNEUMONIA ACTIVE PNEUMONIA SARCOD SARCOD FIG. 4B FIG. 4C

p<0.001 p40.001 p<0.001 p<0.05 P-0001 p<0.001 P-0001 p<0.001 if 30 p-0001p-0001 if 25 p-0001p-0001 1 D S 5 SB 2.0 5 20 e Z C2 E 235 s. 1.5 a PS 10 3. H ass Z. ise S 0.5 32 s

O 0.0 4 n. NON-ACTIVE CANCER 4. NON-ACTIVE CANCER SARCOD SARCOD ACTIVE PNEUMONIA ACTIVE PNEUMONIA SARCOD SARCOD FIG. 4D FIG. 4E Patent Application Publication Nov. 5, 2015 Sheet 9 of 21 US 2015/0315643 A1

FIG. 5A PERCENTAGE OF GENES IN PATHWAY O 20 40 60

TB ACTIVE SARCOD EIF2 SIGNALING PNEUMONIA (UNDER-ABUNDANT) CANCER O 5 10 15 -log (B-H p-value)

PERCENTAGE OF GENES IN PATHWAY O 20 40 60 TB -NN ACTIVE SARCOID - IFNSIGNALING PNEUMONIA (OVER-ABUNDANT) CANCER O 5 10 15 -log (B-H p-value)

PERCENTAGE OF GENES IN PATHWAY O 20 40 60

TB-NN ROLE OF PRRS IN ACTIVE SARCOD RECOGNITION OF PNEUMONIA BACTERIA AND VIRUSES (OVER-ABUNDANT) CANCER O 5 10 15 -log (B-H p-value)

PERCENTAGE OF GENES IN PATHWAY O 20 40 60 TB -NN ANTIGEN ACTIVE SARCOID - PRESENTATION PNEUMONIA PATHWAY (OVER-ABUNDANT) CANCER

O 5 10 15 -log (B-H p-value) Patent Application Publication Nov. 5, 2015 Sheet 10 of 21 US 2015/0315643 A1

GIOOLIHOOOOn10)--/

@@º@z

STIETOTIN Patent Application Publication Nov. 5, 2015 Sheet 11 of 21 US 2015/0315643 A1

GIOOLIHOOOOnT9X–—/

STIETOTIN Patent Application Publication Nov. 5, 2015 Sheet 12 of 21 US 2015/0315643 A1 "OIH[?NET)|×]VINOWITEINdCIS

Patent Application Publication Nov. 5, 2015 Sheet 13 of 21 US 2015/0315643 A1

GIOOLHOOOOn10)--/

X

@ "OIHEFNEI)|×]}|E|ONY/OSONTOTEIS INVOIN?n8W-HECINQÑ„No.,/*%\€)sz Patent Application Publication Nov. 5, 2015 Sheet 14 of 21 US 2015/0315643 A1

FIG. 6A-1 O O o O Ol 6 (3. O Ole o e o e g g o el o e o O & ele o og o O O 2 o ogo O O o O O e e o so o ol go o

o O O O Q Q O 99 () O O e e () e o O ele o (3) () 2 O O ( 0 () () ( e. lo is o 6 o 6 6 o O 6 lo 35CD

() () e Q goQ oo H5 Q Q Q lo (3) o (E) (3 e o O o O O o o o

A B C D E

PRE-TREATMENT POST-TREATMENT PRE-TREATMENT

v=v-7 PNEUMONIA Patent Application Publication Nov. 5, 2015 Sheet 15 of 21 US 2015/0315643 A1

FIG. 6A-2 O O Go O G M410 BCELLS M9.2 B-CELLS e Q O Ole e M4.1 T-CELLS O O Ole O M5.15 NEUTROPHILS M4.11 PLASMA CELLS M4.14 MONOCYTES O9 O M1.1 PLATELETS O M2.3 ERYTHROCYTES M3.1 ERYTHROCYTES () elo M3.6 CYTOTOXCINK M7.21 CYTOTOXC O () (3 G M12 IN INTERFERON 999 Q M3.4 IFN r- MODULES M5.12 IFN (D seesO () () () M7.18 IMMUNOSUPRESSION () o () M8.61 IMMUNE RESPONSES () M8.89 IMMUNE RESPONSES G3 G3 (3 & O MSYNSESSIONM32 NFLAMMATION INFLAMMATION M42 NFLAMMATION 1. MODULES O ()o ee() () (3e O9 M4.13 NFLAMMATION Ole O e M5.1 NFLAMMATION M7.22 NFLAMMATION olo de :: M7.24 LYMPHOCYTEACTIVATION M2.1 CELL CYCLE |o o e M2.2 CELL CYCLE OO M3.3 CELL CYCLE M3.5 CELL CYCLE o o?e o o M6.11 CELL CYCLE 6 M6.13 CELL DEATH oo o o M6.16 CELL CYCLE O M7.6 PROLIFERATION () o|o o o M7.7 PROLIFERATION M7.8 CELL CYCLE M7.16 DCIAPOPTOSIS olo e as M5.10 MIT RESPIRATION () () M6.2 MITOCHONDRIAL RESPRATION (3 l (; 6 (3) : M6.12 MITOCHONDRIAL STRESS 32 52 62 7212 23 33 4243 PATIENTID INADEQUATE GOOD TREATMENT TREATMENT O OVEREXPRESSED RESPONSE RESPONSE 6 (8 & 3) (3) () (O UNDEREXPRESSED =w=/ 100908070 60504030 20 SA RCODOSIS PERCENTAGE PROBE (P<0.05) Patent Application Publication Nov. 5, 2015 Sheet 16 of 21 US 2015/0315643 A1

1000 p<0.001 p40.001

500

| O ww. As CONTROL UNTREATED TREATED PNEUMONIA PNEUMONIA FIG. 6B

400 p<0.001 p<0.001

or if 300 s O 200 O t, a 100

O LATENT TB END OF TREATMENT ACTIVETB

1500 p<0.001 PRE-TREATMENT FIG. 6C

1000

500

CONTROL TREATMENT RESPONDING SARCOD UNTREATED INADEQUATE SARCODOSIS RESPONDERS FIG. 6D Patent Application Publication Nov. 5, 2015 Sheet 17 of 21 US 2015/0315643 A1

STTEO|8C]O

SELÄOONOW V/.

ClOOTEETOHNA Patent Application Publication Nov. 5, 2015 Sheet 18 of 21 US 2015/0315643 A1

TB ACTIVESARCODOSIS (IFNGENES FROM BERRYetal 2010) (IFNGENES FROM BERRYetal 2010) 5 3

E|0NWHOOTO-] STORHINOOWOH-] EÐNWHOCITO STORJINOOWOH-]

ClOOTEETOHNA SELÄOONOW STTEO||7010 STTEO1800 ClOOTÆETIOHNA SELAOONOW STTEO||7010 STTEO1900 FIG. 7B FIG. T.C

TB ACTIVESARCODOSIS (IFN MODULE) (IFN MODULE) 2.0

EÐNWHOOTO-] STORHINOOWOH-]

0.0 ClOOTEETOHM SELAOONOW STTEO||7010 STTEO1800 C]OOTSETOHNA SELAOONOW STTEO||7010 STTEO1800 FIG. TD FIG. TE Patent Application Publication Nov. 5, 2015 Sheet 19 of 21 US 2015/0315643 A1

p40.001 p<0.01 60 p40.01

40 NEUTROPHIL MODULE PERCENTAGE OF GENES 20

O TB NON-ACTIVE CANCER SARCOD ACTIVE PNEUMONIA SARCOD FIG. 8A

p<0.001 p<0.001 10 p<0.001 p40.001

8

NEUTROPHIL MODULE FOLD CHANGE

2

O NON-ACTIVE CANCER SARCOD ACTIVE PNEUMONIA SARCOID FIG. 8B Patent Application Publication Nov. 5, 2015 Sheet 20 of 21 US 2015/0315643 A1

CC S&C C) &3. S&s Sks 1©NITVNOISXI9S0/dSÅWWWHLWdTTE,O

©NIT\/N?ISHOLLU• Patent Application Publication Nov. 5, 2015 Sheet 21 of 21 US 2015/0315643 A1

2 As MAERTZDORF

1. 2 Ea PROBES

6& NE x8X &E7a 7 %8. MAYY Nas/ &N7Šx& V 137

144 ILLUMINA PROBES

FIG 1 OA

1396 LIST 1446 LIST DISTINGUISHESTB, DISTINGUISHES ACTIVE SARCOIDOSIS, TB, SARCOIDOSIS, NON-ACTIVE PNEUMONIA AND SARCOIDOSIS, LUNG CANCER PNEUMONIA AND A LUNG CANCER

FIG 1 OB US 2015/03 15643 A1 Nov. 5, 2015

BLOODTRANSCRIPTIONAL SIGNATURES nity acquired pneumonia and other lung inflammatory disor OF ACTIVE PULMONARY TUBERCULOSS ders such as primary lung cancer. Due to the complexity of AND SARCODOSIS these diseases a systems biology approach offers the ability to help unravel the principal host immune responses. Peripheral CROSS-REFERENCE TO RELATED blood has the capacity to reflect pathological and immuno APPLICATIONS logical changes in the body, and identification of disease associated alterations can be determined by a blood transcrip 0001. None. tional signature (5). In addition the applicants have published a IFN-inducible neutrophil blood transcriptional signature in TECHNICAL FIELD OF THE INVENTION active TB patients that is absent in the majority of latent 0002 The present invention relates in general to the field individuals and healthy controls, that correlates significantly with the extent of lung radiographic disease (5) and is dimin of medical diagnosis and medical treatment, and more par ished upon treatment (5, 12). ticularly, to a novel blood transcriptional signatures to distin 0007 Blood expression profiling has been success guish between active pulmonary tuberculosis, Sarcoidosis, fully applied to other infectious and inflammatory disorders, lung cancer and pneumonia. Such as Systemic lupus erythematosus (SLE), to help under stand disease mechanisms and improve diagnosis and treat STATEMENT OF FEDERALLY FUNDED ment (5). Two recent studies have used blood transcriptional RESEARCH profiling for the comparison of pulmonary TB and sarcoido 0003. None. sis; both studies found the diseases had similar transcriptional responses, which involved the overexpression of IFN-induc INCORPORATION-BY-REFERENCE OF ible genes (9,10). However, these studies did not differentiate MATERIALS signatures from other pulmonary diseases leaving to question if the transcriptional signatures were non-specific for pulmo 0004. A number of lengthy tables are included herewith nary disorders. and the content incorporated herein by reference. The text file Symbol-Regulation-ID.txt is 47 Kb, Symbol-Sequence-ID. SUMMARY OF THE INVENTION txt is 92 Kb, and 1359-List.txt is 88 Kb and are filed herewith 0008. In one embodiment, the present invention includes a and incorporated by reference in their entirety. method of determining if a human subject is afflicted with

LENGTHY TABLES The patent application contains a lengthy table section. A copy of the table is available in electronic form from the USPTO web site (http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20150315643A1). An electronic copy of the table will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

BACKGROUND OF THE INVENTION pulmonary disease comprising: obtaining a sample from a 0005 Without limiting the scope of the invention, its back Subject Suspected of having a pulmonary disease; determin ground is described in connection with transcriptional signa ing the expression level of six or more genes from each of the tures. Over nine million new cases of active tuberculosis following genes expressed in one or more of the following (TB), and 1.4 million deaths from TB, are estimated to occur expression pathways: EIF2 signaling; mTOR signaling; regu around the world every year (1). One of the difficulties of lation of eIF4 and p70S6K signaling; interferon signaling; curing pulmonary TB is the ability to diagnose the disease antigen presentation pathways; T cell signaling pathways: from other similar pulmonary diseases such as pulmonary and other signaling pathways; comparing the expression level sarcoidosis, community acquired pneumonia and lung can of the six or more genes with the expression level of the same cer. TB and sarcoidosis are widespread multisystem diseases genes from individuals not afflicted with a pulmonary dis that preferentially involve the lung and present in a very ease, and determining the level of expression of the six or similar clinical, radiological and histological manner. Distin more genes in the sample from the Subject relative to the guishing these diseases therefore often requires an invasive samples from individuals not afflicted with a pulmonary dis biopsy. ease for the genes expressed in the one or more expression 0006 Granuloma formation is fundamental to both these pathways, wherein co-expression of genes in the EIF2 signal diseases and although the aetiology of TB is well-recognised ing and mTOR signaling pathways are indicative of active as the pathogen Mycobacterium tuberculosis, the predomi sarcoidosis; co-expression of genes in the regulation of eIF4 nant cause of sarcoidosis remains unknown (2). The under and p70S6K signaling pathways is indicative of pneumonia; lying pathways of granulomatous inflammation are also co-expression of genes in the interferon signaling and antigen poorly understood and there is little understanding of disease presentation pathways are indicative of tuberculosis; and co specific differences. Both sarcoidosis and TB can affect expression of genes in the T cell signaling pathways; and adults within the same age group, who then present with other signaling pathways is indicative of lung cancer. In one similar pulmonary symptoms and radiological thoracic aspect, the genes associated with tuberculosis are selected abnormalities (3, 4). TB can also display a similar presenta from at least 3, 4, 5 or 6 genes selected from ANKRD22; tion to other pulmonary infectious diseases such as commu FCGR1A: SERPING1; BATF2: FCGR1C: FCGR1B: US 2015/03 15643 A1 Nov. 5, 2015

LOC728744; IFITM3; EPSTI1; GBP5; IF144L GBP6; culosis, active sarcoidosis, or lung cancer are selected from GBP1; LOC400759; IFIT3: AIM2: SEPT4; C1OB; GBP1; DEFA4; ELANE; MMP8; OLR1; COL17A1: RETN; RSAD2: RTP4; CARD17: IFIT3: CASP5; CEACAM1; GPR84; LOC100134379; TACSTD2: SLC2A11; CARD17; ISG15; IF127; TIMM10; WARS: IF16; TNFAIP6; LOC100.130904; MCTP2: AZU1; DACH1; GADD45A; PSTPIP2: IF144: SCO2; FBXO6; FER1L3: CXCL10; NSUN7; CR1. CDK5RAP2: LOC284648; GPR177; DHRS9; OAS1; STAT1: HP; DHRS9; CEACAM1; CLEC5A; UPB1; SLC2A5; GPR177; APP; LAMC1; SLC26A8; CACNA1E; OLFM4; and APOL6, wherein the REPS2; PIK3CB; SMPDL3A; UBE2C; NDUFAF3: CDC20. genes are evaluated at least one of in aggregate, in the order CTSK; RAB13; LOC651524; TMEM176A; PDGFC: listed, aggregated into pathways, or selected from 7, 8, 9, 10. ATP9A; SV2A; SPOCD1; MARCO; CCDC109A; 11, 12, 13, 15, 20, 25, 35, 40, 45, or 49 genes. In another NUSAP1; SLCO4C1, CYP27A1; LOC644615; PKM2; aspect, the genes associated with tuberculosis and not active BMX; PADI4; and NAMPT, wherein the genes are evaluated sarcoidosis, pneumonia or lung cancer are selected from at least one of in aggregate, in the order listed, aggregated C1OB; IF127; SMARCD3; SOCS1; KCNJ15; LPCAT2: into pathways, or selected from 7, 8, 9, 10, 11, 12, 13, 15, 20, ZDHHC19: FYB: SP140; IFITM1; ALAS2; CEACAM6; 25, 35, 40, 45, or 49 genes. In another aspect, the genes OAS2; C1OC: LOC100133565; ITGA2B: LY6E: SP140; associated with lung cancer are selected from ARG1; TPST1: CASP7; GADD45G; FRMD3; CMPK2: AQP10; CXCL14: FCGR1A; C19orf59; SLPI; FCGR1B: IL1R1, FCGR1C: ITPRIPL2: FAS: XK; CARD16; SLAMF8: SELP, NDN: TDRD9; SLC26A8; FCGR1B: CLEC4D: LOC100132858: OAS2; TAPBP: BPI; DHX58; GAS6; CPT1B, CD300C; SLC22A4; LOC100.133177; SIPA1 L2: ANXA3; LIMK2: LILRA6; USF1, C2: 38231.0: NFXL1; GCH1; CCR1; TMEM88; MMP9; ASPRV1; MANSC1; TLR5; CD163; OAS2: CCR2. F2RL1, SNX20; and ARAP2, wherein the CAMP; LOC642816; DPRXP4; LOC643313; NTN3; genes are evaluated at least one of in aggregate, in the order MRVI1; F5; SOCS3: TncRNA: MIR21; LOC100170939; listed, aggregated into pathways, or selected from 7, 8, 9, 10. LOC100129904; GRB10; ASGR2: LOC642780; 11, 12, 13, 15, 20, 25, 35, 40, 45, or 49 genes. In another LOC400499; FCAR; KREMEN1; SLC22A4: CR1; aspect, the genes associated with active sarcoidosis are LOC730234; SLC26A8; C7orf53; VNN1; NLRC4; and selected from FCGR1A: ANKRD22; FCGR1C: FCGR1B: LOC400499, wherein the genes are evaluated at least one of: SERPING 1. FCGR1B: BATF2; GBP5; GBP1; IFIT3: in aggregate, in the order listed, aggregated into pathways, or ANKRD22; LOC728744: GBP1; EPSTI1; IF144L INDO; selected from 7, 8, 9, 10, 11, 12, 13, 15, 20, 25, 35, 40, 45, or IFITM3; GBP6; RSAD2: DHRS9; TNFAIP6; IFIT3: 49 genes. In another aspect, the genes associated with lung P2RY14; DHRS9; IDO1, STAT1: WARS: TIMM10; cancer and not tuberculosis, active sarcoidosis, or pneumonia P2RY14; LOC389386; FER1L3: IFIT3; RTP4: SCO2: are selected from TPST1: MRVI1; C7orf53; ECHDC3; GBP4; IFIT1: LAP3; OASL; CEACAM1, LIMK2: CASP5; LOC651612; LOC100134660; TIAM2; KIAA1026; STAT1: CCL23; WARS: ATF3; IF16; PSTPIP2: ASPRV1; HECW2: TLE3: TBC1D24; LOC441193: CD163; RFX2: FBXO6; and CXCL10, wherein the genes are evaluated at LOC100134688; LOC642342; FKBP9L, PHF20L1; least one of in aggregate, in the order listed, aggregated into LOC402176, CD163, OSBPL1A; PRMT5: UBTD1; pathways, or selected from 7, 8, 9, 10, 11, 12, 13, 15, 20, 25, ADORA3; SH2D3C; RBP7; ERGIC1; TMEM45B; CUX1; 35, 40, 45, or 49 genes. In another aspect, the genes associated TREM1; C1GALT1C1; MAML3;C15orf29; DSC2: RRP12: with active sarcoidosis and not tuberculosis, pneumonia or LRP3; HDAC7A: FOS: C14orf4: LIPN: MAP1LC3B2: lung cancer are selected from CCL23; PIK3R6: EMR4; LOC400793; LOC647834; PHF20L1, CCNJL; SLC12A6; CCDC146; KLF4: GRINA; SLC4A1; PLA2G7: FLJ42957; CCDC147: SLC25A40; and LOC649270, GRAMD1B: RAPGEF1. NXNL1; TRIM58; GABBR1; wherein the genes are evaluated at least one of in aggregate, TAGLN; KLF4; MFAP3L LOC641798: RIPK2: in the order listed, aggregated into pathways, or selected from LOC650840; FLJ43093; ASAP2: C15orf26; REC8; 7, 8, 9, 10, 11, 12, 13, 15, 20, 25, 35, 40, 45, or 49 genes. In KIAA0319L: GRINA; FLJ300.92; BTN2A1; HIF1A: another aspect, the genes associated with lung cancer and not LOC440313; HOXA1; LOC645153; ST3GAL6; LONRF1: tuberculosis, active sarcoidosis, or pneumonia are selected PPP1R3B; MPPE1; LOC652699; LOC646144: SGMS1; from wherein the genes associated with lung cancer and not BMP2K; SLC31A1, ARSB; CAMK1D, ICAM4; HIF1A: tuberculosis, active sarcoidosis, or pneumonia are selected LOC641996; RNASE10; PI15; SLC30A1: LOC389124; and from Table 1 by: parsing the genes into the expression path ATP1A3, wherein the genes are evaluated at least one of: in ways, and determining that the subject is afflicted with a aggregate, in the order listed, aggregated into pathways, or pulmonary disease selected from tuberculosis, Sarcoidosis, selected from 7, 8, 9, 10, 11, 12, 13, 15, 20, 25, 35, 40, 45, or cancer or pneumonia based on the from a 49 genes. In another aspect, the genes associated with pneu sample obtained from the subject when compared to the level monia are selected from OLFM4; LTF;VNN1; HP: DEFA4; of expression of the genes in each of the expression pathways. OPLAH; CEACAM8; DEFA1B: ELANE; C19orf59; ARG1; In another aspect, the specificity is 90 percent or greater and CDK5RAP2: DEFA1B: DEFA3: DEFA1B: FCGR1A: sensitivity is 80 percent or greater for a diagnosis of tubercu MMP8; FCGR1B: SLPI; SLC26A8; MAPK14; CAMP; losis or sarcoidosis. In another aspect, the method further NLRC4; FCAR RNASE3; FCGR1B: NAIP; OLR1; comprises a method for displaying if the patient has tubercu FCGR1C: ANXA3: DEFA1; PGLYRP1: TCN1; losis or sarcoidosis aggregating the expression data from the ANKDD1A; COL17A1; SLC26A8; TMEM144; SAMD14: 3, 4, 5, 6 or more genes into a single visual display of a vector MAPK14: RETN; NAIP; GPR84; CASP5; MPO; MMP9; of expression for tuberculosis, sarcoidosis, cancer oran infec CR1; MYL9; CLEC4D; ITGAX; and ANKRD22, wherein tious pulmonary disease. In another aspect, the method fur the genes are evaluated at least one of in aggregate, in the ther comprises the step of detecting and evaluating 7, 8, 9, 10, order listed, aggregated into pathways, or selected from 7, 8, 12, 15, 20, 25, 35, 50, 75,90, 100, 125, or 144 genes for the 9, 10, 11, 12, 13, 15, 20, 25,35, 40, 45, or 49 genes. In another analysis. In another aspect, the method further comprises the aspect, the genes associated with pneumonia and not tuber step of detecting and evaluating the EIF2 signaling. mTOR US 2015/03 15643 A1 Nov. 5, 2015

signaling; regulation of eIF4 and p70S6K signaling; inter sized on the Surface of a chip or wafer. In another aspect, the feron signaling; antigen presentation pathways; T cell signal oligonucleotides are about 10 to about 50 nucleotides in ing pathways; and other signaling pathways from 7, 8, 9, 10. length. In another aspect, the method further comprises the 12, 15, 20, 25, 35, 50, 75,90, 100, 125, or 144 genes that are step of using the determined comparative gene product infor upregulated or downregulated and are selected from UBE2J2: mation to formulate at least one of diagnosis, a prognosis or a ALPL: JMJD6; FCER1G: LILRA5; LY96; FCGR1C: treatment plan. In another aspect, the patient's disease state is C10orf33; GPR109B; PROK2; PIM3; SH3GLB1: DUSP3; further determined by radiological analysis of the patients PPAP2C; SLPI; MCTP1; KIF1B: FLJ32255; BAGE5; lungs. In another aspect, the method further comprises the IFITM1; GPR109A. IF135; LOC653591; KREMEN1; step of determining a treated patient gene expression dataset IL18R1; CACNA1E; ABCA2; CEACAM1; MXD4: after the patient has been treated and determining if the TncRNA, LMNB 1: H2AFJ; HP; ZNF438: FCER1A: treated patient gene expression dataset has returned to a nor SLC22A4; DISC1, MEFV; ABCA1; ITPRIPL2: KCNJ15; mal gene expression dataset thereby determining if the LOC728519; ERLIN1; NLRC4; B4GALT5: LOC653610; patient has been treated. HIST2H2BE; AIM2; P2RY10; CCR3: EMR4P; NTN3; 0009. Another embodiment of the present invention C1OB:TAOK1; FCGR1B: GATA2: FKBP5; DGAT2: TLR5; includes a method of determining a lung disease from a CARD17: INCA; MSL3L1; ESPN, LOC645159; C19orf59; patient Suspected of sarcoidosis, tuberculosis, lung cancer or CDK5RAP2: PLSCR1; RGL4; IFI30; LOC641710; GAG pneumonia comprising: obtaining a sample from the patient GCTTTCAGGTAGGAGGACAATGGTAG Suspected of sarcoidosis, tuberculosis, lung cancer or pneu CACTGTAGGTCCCCAGTGTCG (SEQ ID NO.: 754); monia; detecting expression of 3, 4, 5, 6 or more disease LOC100008589; LOC100008589: SMARCD3; NGFRAP1; genes, markers, or probes of Table 1 (SEQ ID NOS.: 1 to LOC100.1323.94: OPLAH; CACNG6; LILRB4; 1446), wherein increased expression of mRNA of upregu HIST2H2AA4: CYP1B1; PGS1; SPATA13; PFKFB3; lated sarcoidosis, tuberculosis, lung cancer and pneumonia HIST1H3D, SNORA73B; SLC26A8: SULT1B1; ADM: markers of Table 1 and/or decreased expression of mRNA of HIST2H2AA3: HIST2H2AA3: GYG1; CST7: EMR4; downregulated sarcoidosis, tuberculosis, lung cancer or LILRA6; MEF2D; IFITM3; MSL3; DHRS13: EMR4; pneumonia markers of Table 1 relative to the expression of the C16orf57: HIST2H2AC; EEF1D, TDRD9; GPR97; mRNAS from a normal sample; and determining the lung ZNF792; LOC100134364; SRGAP3, FCGR1A: HPSE; disease based on the expression level of the six or more LOC728417; LOC728417; MIR21: HIST1H2BG: COP1; disease markers of Table 1 based on a comparison of the SMARCD3; LOC441763; ZSCAN18; GNG8; MTRF1L: expression level of sarcoidosis, tuberculosis, lung cancer, and ANKRD33; PLAC8; PLAC8; SLC26A8; AGTRAP: pneumonia. In one aspect, the method further comprises the FLJ43093; LPCAT2: AGTRAP; S100A12; SVIL; LILRA5; step of selecting 3, 4, 5, 6 or more genes that are differentially LILRA5; ZFP91; CLC; LOC100.133565; LTEB4R, SEPTO4; expressed between sarcoidosis, tuberculosis, lung cancer, and ANXA3; BHLHB2; IL4R: IFNAR1; MAZ; GCCCCCTAAT pneumonia. In another aspect, the method further comprises TGACTGAATGGAACCCCTCTTGAC the step of differentiating between sarcoidosis that is active CAAAGTGACCCCAGAA (SEQID NO.: 1379); OSM; and sarcoidosis and inactive sarcoidosis by determining the optionally excluding at least one of ADM, SEPT4, IFITM1, expression levels of six or more genes, markers, or probes FCER1G, MED2F, CDK5RAP2 or CARD16. In another selected from: TMEM144; FBLN5; FBLN5; ERI1; CXCR3; aspect, the genes that are downregulated are selected from GLUL, LOC728728; KLHDC8B; KCNJ15; RNF125; MEF2D; BHLHB2: CLC; FCER1A: SRGAP3; FLJ43093; CCNB1IP1; PSG9; LOC100170939; QPCT: CD177; CCR3: EMR4; ZNF792; C10orf33; CACNG6; P2RY10; LOC400499; LOC400499; LOC100134634; TMEM88; GATA2: EMR4P; ESPN: EMR4; MXD4; and ZSCAN 18. In LOC729028; EPSTI1; INSC; LOC728484; ERP27; another aspect, the interferon inducible genes are selected CCDC109A, LOC729580; C2; TTRAP, ALPL: MAEA; from CD274; CXCL10; GBP1; GBP2: GBP5; IF 116; IF 135; COX10; GPR84; TRMT11: ANKRD22; MATKTBC1D24; IF144; IF 144L: IF16; IFIH1, IFIT2: IFIT3: IFIT5; IFITM1; LILRA5; TMEM176B; CAMP; PKIA: PFTK1; TPM2; IFITM3; IRF7; OAS1; OAS2: OAS3; SOCS1; STAT1: TPM2; PRKCQ; PSTPIP2: LOC129607; APRT: VAMPS: STAT2: TAP1; and TAP2. In another aspect, the sample is a FCGR1C; SHKBP1, CD79B; SIGIRR: FKBP9L: blood, peripheral blood mononuclear cells, sputum, or lung LOC729660; WDR74; LOC646434; LOC647834; RECK biopsy. In another aspect, the expression level comprises a MGST1: PIWIL4; LILRB1. FCGR1B: NOC3L, ZNF83; mRNA expression level and is quantitated by a method FCGBP, SNORD13; LOC642267; GBP5; EOMES: BST1: selected from the group consisting of polymerase chain reac C5; CHMP7; ETV7; ILVBL, LOC728262: GNLY: tion, real time polymerase chain reaction, reverse tran LOC388572; GATA1: MYBL1; LOC441124; LOC441124; Scriptase polymerase chain reaction, hybridization, probe IL12RB1; BRIX1; GAS6; GAS6; LOC100133740; GPSM1; hybridization and gene expression array. In another aspect, C6orf129; IER3; MAPK14; PROK1; GPR109B; SASP; the expression level is determined using at least one technique LOC728093: PROK2: CTSW; ABHD2: LOC100130775; selected from the group consisting of polymerase chain reac SLITRK4; FBXW2: RTTN; TAF15; FUT7: DUSP3; tion, heteroduplex analysis, single stand conformational LOC399715; LOC642161; LOC100129541; TCTN1; polymorphism analysis, chain reaction, comparative SLAMF8; TGM2: ECE1; CD38; INPP4B, ID3; CR1; CR1; genome hybridization, Southern blotting, Northern blotting, TAPBP: PPAP2C; MBOAT2: MS4A2; FAM176B; Western blotting, -linked immunosorbent assay, fluo LOC390183: SERPING1 LOC441743; H1F0; SOD2: rescent resonance energy-transfer and sequencing. In another LOC642828; POLB: TSPAN9; ORMDL3: FER1L3; LBH; aspect, the expression level is determined by microarray PNKD; SLPI; SIRPB1; LOC389386; REC8; GNLY: GNLY: analysis that comprises use of oligonucleotides that hybridize FOLR3; LOC730286; SKAP1: SELP, DHX30; KIAA1618; to mRNA transcripts or cDNAs for the six or more genes, and NQO2: ANKRD46; LOC646301; LOC400464: wherein the oligonucleotides are disposed or directly synthe LOC100134703; C20orf106; SLC25A38; YPEL1; IL1R1;

US 2015/03 15643 A1 Nov. 5, 2015 the one or more expression pathways, selected from: EIF2 0013 FIG. 1 shows a heatmap of pulmonary granuloma signaling and mTOR signaling pathways are indicative of tous diseases, TB and sarcoidosis, display similar transcrip active sarcoidosis; co-expression of genes in the regulation of tional signatures (of 1446 transcripts) to each other but dis eIF4 and p70S6K signaling pathways is indicative of pneu tinct from pneumonia and lung cancer. monia; co-expression of genes in the interferon signaling and 0014 FIG. 2 shows a heat map with three dominant clus antigen presentation pathways are indicative of tuberculosis: ters of transcripts in the unsupervised clustering of the 1446 and co-expression of genes in the T cell signaling pathways: transcripts are associated with distinct Ingenuity Pathway and other signaling pathways is indicative of lung cancer. In Analysis canonical pathways. one aspect, the genes that are downregulated are selected 0015 FIGS. 3A and 3B (quantitative) show that sarcoido from MEF2D; BHLHB2: CLC; FCER1A: SRGAP3; sis patients clinically classified as active sarcoidosis display FLJ43093: CCR3: EMR4; ZNF792; C10orf33; CACNG6; similar transcriptional signatures to the TB patients but are P2RY10; GATA2: EMR4P; ESPN: EMR4; MXD4; and very distinct from the transcriptional signatures of the clini ZSCAN 18. In another aspect, the method further comprises a cally classified non-active sarcoidosis patients, which in turn method for displaying if the patient has tuberculosis, sarcoi resemble the healthy controls. dosis, cancer or pneumonia by aggregating the expression 0016 FIGS. 4A to 4E show a modular analysis of the data from the six or more genes into a single visual display of Training Set shows the similarity of the biological pathways a vector of expression for tuberculosis, sarcoidosis, cancer or associated with TB and sarcoidosis (which show particularly pneumonia. In another aspect, the method further comprises overexpression of the IFN modules), differing from pneumo the step of detecting and evaluating 7, 8, 9, 10, 12, 15, 20, 25, nia and lung cancer (particularly overexpression of the 35, 50, 75, 90, 100, 125, or 144 genes for the analysis. In inflammation modules). All are quantitated in FIGS. 4D and another aspect, the sample is a blood, peripheral blood mono 4E nuclear cells, sputum, or lung biopsy. In another aspect, the (0017 FIGS.5A to 5E show a Comparison Ingenuity Path expression level comprises an mRNA expression level and is way Analysis of the four disease groups compared to their quantitated by a method selected from the group consisting of matched controls reveals the four most significant pathways. polymerase chain reaction, real time polymerase chain reac 0018 FIGS. 6A to 6D shows both modular analysis and tion, reverse transcriptase polymerase chain reaction, hybrid molecular distance to health reveal that the blood transcrip ization, probe hybridization and gene expression array. In tome of the pneumonia and TB patients after successfully another aspect, the expression level is determined using at completing treatment are no different from the healthy con least one technique selected from polymerase chain reaction, trols, however the sarcoidosis patients show an overexpres heteroduplex analysis, single stand conformational polymor sion of inflammation genes during a clinically successful phism analysis, ligase chain reaction, comparative genome response to glucocorticoids. hybridization, Southern blotting, Northern blotting, Western 0019 FIGS. 7A to 7E shows that the Interferon-inducible blotting, enzyme-linked immunosorbent assay, fluorescent gene expression is most abundant in the neutrophils in both resonance energy-transfer and sequencing. In another aspect, TB and sarcoidosis. the expression level is determined by microarray analysis that (0020 FIGS. 8A and 8B are graphs with the results for the comprises use of oligonucleotides that hybridize to mRNA pulmonary diseases using the genes in the neutrophil module. transcripts or cDNAs for the six or more genes, and wherein 0021 FIG. 9 is a 4-set Venn diagram comparing the dif the oligonucleotides are disposed or directly synthesized on ferentially expressed genes for each disease group compared the Surface of a chip or wafer. In another aspect, the oligo to their ethnicity and gender matched controls. nucleotides are about 10 to about 50 nucleotides in length. In 0022 FIG.10A is a Venn diagram comparing the gene lists another aspect, the method further comprises the step of using used in the class prediction. FIG. 10B is a Venn diagram the determined comparative gene product information to for comparing the genes that distinguish between Tb, sarcoido mulate at least one of diagnosis, a prognosis or a treatment sis, pneumonia and lung cancer, Versus, Tb, active sarcoido plan. In another aspect, the patient’s disease state is further sis, non-active sarcoidosis, pneumonia and lung cancer. determined by radiological analysis of the patient’s lungs. In another aspect, the method further comprises step of deter DETAILED DESCRIPTION OF THE INVENTION mining a treated patient gene expression dataset after the patient has been treated and determining if the treated patient 0023. While the making and using of various embodi gene expression dataset has returned to a normal gene or a ments of the present invention are discussed in detail below, it changed gene expression dataset thereby determining if the should be appreciated that the present invention provides patient has been treated. In another aspect, a non-overlapping many applicable inventive concepts that can be embodied in a set of genes is used to distinguish between Tb, sarcoidosis, wide variety of specific contexts. The specific embodiments pneumonia and lung cancer, Versus, Tb, active sarcoidosis, discussed herein are merely illustrative of specific ways to non-active sarcoidosis, pneumonia and lung cancer are make and use the invention and do not delimit the scope of the selected from Table 11, 12 or both. Yet another embodiment invention. of the present invention includes a computer readable 0024. To facilitate the understanding of this invention, a medium comprising computer-executable instructions for number of terms are defined below. Terms defined herein have performing the methods of the present invention. meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as “a”, “an and “the are not intended to refer to only a BRIEF DESCRIPTION OF THE DRAWINGS singular entity, but include the general class of which a spe 0012 For a more complete understanding of the features cific example may be used for illustration. The terminology and advantages of the present invention, reference is now herein is used to describe specific embodiments of the inven made to the detailed description of the invention along with tion, but their usage does not delimit the invention, except as the accompanying figures and in which: outlined in the claims. US 2015/03 15643 A1 Nov. 5, 2015

0025. The present invention provides methods, composi meric Surfaces, fibers such as fiber optics, glass or any other tions, biomarkers and tests for evaluating the immunopatho appropriate Substrate. Arrays may be packaged in Such a genesis underlying TB and other pulmonary diseases, by manner as to allow for diagnostics or other manipulation of an comparing the blood transcriptional responses in pulmonary all inclusive device, see for example, U.S. Pat. No. 6,955,788, TB patients to that found in pulmonary sarcoidosis, pneumo relevant portions incorporated herein by reference. nia and lung cancer patients. It also provides for the first time a complete, reproducible comparison of blood transcriptional 0029. As used herein, the term “disease' refers to a physi responses before and after treatment in each disease, and ological state of an organism with any abnormal biological examining the transcriptional responses seen in the different state of a cell. Disease includes, but is not limited to, an leucocyte populations of the granulomatous diseases. In addi interruption, cessation or disorder of cells, tissues, body func tion the present inventors investigated the association tions, systems or organs that may be inherent, inherited, between the clinical heterogeneity of sarcoidosis and the caused by an infection, caused by abnormal cell function, observed blood transcriptional heterogeneity. abnormal cell division and the like. A disease that leads to a 0026. As used herein, the term “array' refers to a solid “disease state' is generally detrimental to the biological sys Support or Substrate with one or more peptides or nucleic acid tem, that is, the host of the disease. With respect to the present probes attached to the Support. Arrays typically have one or invention, any biological state, such as an infection (e.g., more different nucleic acid or peptide probes that are coupled viral, bacterial, fungal, helminthic, etc.), inflammation, auto to a surface of a substrate in different, known locations. These inflammation, autoimmunity, anaphylaxis, allergies, prema arrays, also described as “microarrays' or “gene-chips' that lignancy, malignancy, Surgical, transplantation, physiologi may have 10,000; 20,000, 30,000; or 40,000 different identi cal, and the like that is associated with a disease or disorder is fiable genes based on the known genome, e.g., the human considered to be a disease state. A pathological state is gen genome. These pan-arrays are used to detect the entire “tran erally the equivalent of a disease state. Disease states may Scriptome' or transcriptional pool of genes that are expressed also be categorized into different levels of disease state. As or found in a sample, e.g., nucleic acids that are expressed as used herein, the level of a disease or disease state is an arbi RNA, mRNA and the like that may be subjected to RT and/or trary measure reflecting the progression of a disease or dis RT-PCR to made a complementary set of DNA replicons. The ease state as well as the physiological response upon, during microarray is well known in the art, for example, U.S. Pat. and after treatment. Generally, a disease or disease state will Nos. 5,445,934 and 5,744,305. The term also includes all the progress through levels or stages, wherein the affects of the devices so called in Schena (ed.), DNA Microarrays: A Prac disease become increasingly severe. The level of a disease tical Approach (Practical Approach Series), Oxford Univer state may be impacted by the physiological state of cells in the sity Press (1999) (ISBN: 0199637768): Nature Genet. 21 (1) sample. As used herein, the terms “module”, “modular tran (suppl):1-60 (1999); and Schena (ed.), Microarray Biochip: scriptional vectors’, or “vectors of gene expression” refer to Tools and Technology, Eaton Publishing Company/BioTech transcriptional expression data that reflects a proportion of niques Books Division (2000) (ISBN: 1881299376)(relevant differentially expressed genes having a common gene expres portions incorporated herein by reference), the disclosures of sion pathway (e.g., interferon inducible genes), are typically which are incorporated herein by reference in their entirety. expressed only or predominantly in a certain cell type (e.g., Arrays may be produced using mechanical synthesis meth genes expressed by neutrophils), or are grouped into a module ods, light directed synthesis methods and the like that incor of genes to yield, in the aggregate a single vector of gene porate a combination of non-lithographic and/or photolitho expression, such that the overall expression is expressed as a graphic methods and solid phase synthesis methods. In one single vector that includes both a direction (under expressed embodiment, the present invention includes simplified arrays or over expressed) and intensity of the under or over expres that can include a limited number of probes, e.g., 3, 4, 5, 6, 7, Sion. For example, for each module the proportion of tran 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, Scripts differentially expressed between at least two groups 25, 30, 35, 40, 45, 50, 60, 70, 80,90, 100, 110, 120, 130, 140, (e.g., healthy Subjects versus patients, or certain patients of a 144, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1,000, first disease versus a group of patients with a second disease). 1,100, 1,200, 1,300, 1,400, or even 1.446 genes or probes in a The vector of expression is derived from the comparison of customized or customizable microarray adapted for pulmo two or more groups of samples. The first analytical step is nary disease detection, diagnosis and evaluation. used for the selection of disease-specific sets of transcripts 0027. As used herein the term “biomarker refers to a within each module. Next, there is the “expression level.” The specific biochemical in the body that has a particular molecu group comparison for a given disease provides the list of lar feature to make it useful for diagnosing and measuring the differentially expressed transcripts for each module. It was progress of disease or the effects of treatment. Certain biom found that different diseases yield different subsets of modu arkers form part of the present invention and are attached to lar transcripts. With this expression level it is then possible to this application as Lengthy Tables, that are included herewith calculate a vector of expression for each of the module(s) for and the content incorporated herein by reference. The text file a single sample by averaging expression values of disease Symbol-Regulation-ID.txt is 47Kb and Symbol-Sequence specific subsets of genes identified as being differentially ID.txt provide the list of 1446 probe sequences and genes that expressed. This approach permits the generation of maps of are associated with the majority of the same. Also included modular expression vectors for a single sample, e.g., those herewith is a list of 1359 genes that overlay in certain condi described in the module maps disclosed herein. These vector tions as described hereinbelow. of expression or module maps represent an averaged expres 0028. Various techniques for the synthesis of these nucleic sion level for each module (instead of a proportion of differ acid arrays have been described, e.g., fabricated on a Surface entially expressed genes) that can be derived for each sample. of virtually any shape or even a multiplicity of Surfaces. An example of the vector of gene expression is shown in, e.g., Arrays may be peptides or nucleic acids on beads, gels, poly FIG. 6A. US 2015/03 15643 A1 Nov. 5, 2015

0030. Using the present invention it is possible to identify 0033. As used herein, the terms “therapy’ or “therapeutic and distinguish pulmonary diseases not only at the module regimen” refer to those medical steps taken to alleviate or level, but also at the gene-level; i.e., two, three or four diseases alter a disease state, e.g., a course of treatment intended to can have for certain modules the same vector (identical pro reduce or eliminate the affects or symptoms of a disease using portion of differentially expressed transcripts, identical pharmacological, Surgical, dietary and/or other techniques. A “polarity'), but the gene composition of the vector can still be therapeutic regimen may include a prescribed dosage of one disease-specific, and vice versa. Gene-level expression pro or more drugs or Surgery. Therapies will most often be ben vides the distinct advantage of greatly increasing the resolu eficial and reduce the disease state but in many instances the effect ofatherapy will have non-desirable or side-effects. The tion of the analysis. effect of therapy will also be impacted by the physiological 0031 Gene expression monitoring systems for use with State of the host, e.g., age, gender, genetics, weight, other the present invention may include customized gene arrays disease conditions, etc. with a limited and/or basic number of genes that are specific 0034. As used herein, the term “pharmacological state' or and/or customized for the one or more target diseases. Unlike “pharmacological status' refers to those samples from dis the general, pan-genome arrays that are in customary use, the eased individuals that will be, are and/or were treated with present invention provides for not only the use of these gen one or more drugs, Surgery and the like that may affect the eral pan-arrays for retrospective gene and genome analysis pharmacological state of one or more nucleic acids in a without the need to use a specific platform, but more impor sample, e.g., newly transcribed, stabilized and/or destabilized tantly, it provides for the development of customized arrays as a result of the pharmacological intervention. The pharma that provide an optimal gene set for analysis without the need cological state of a sample relates to changes in the biological for the thousands of other, non-relevant genes. One distinct status before, during and/or after drug treatment and may advantage of the optimized arrays and modules of the present serve as a diagnostic or prognostic function, as taught herein. invention over the existing art is a reduction in the financial Some changes following drug treatment or Surgery may be costs (e.g., cost per assay, materials, equipment, time, person relevant to the disease state and/or may be unrelated side nel, training, etc.), and more importantly, the environmental effects of the therapy. Changes in the pharmacological state cost of manufacturing pan-arrays where the vast majority of are the likely results of the duration of therapy, types and the data is irrelevant. The modules of the present invention doses of drugs prescribed, degree of compliance with a given allow for the first time the design of simple, custom arrays that course of therapy, and/or un-prescribed drugs ingested. provide optimal data with the least number of probes while 0035. As used herein, the term “biological state” refers to maximizing the signal to noise ratio. By eliminating the total the state of the transcriptome (that is the entire collection of number of genes for analysis, it is possible to, e.g., eliminate RNA transcripts) of the cellular sample isolated and purified the need to manufacture thousands of expensive platinum for the analysis of changes in expression. The biological state masks for photolithography during the manufacture of pan reflects the physiological state of the cells in the blood sample genetic chips that provide vast amounts of irrelevant data. by measuring the abundance and/or activity of cellular con Using the present invention it is possible to completely avoid stituents, characterizing according to morphological pheno the need for microarrays if the limited probe set(s) of the type or a combination of the methods for the detection of present invention are used with, e.g., digital optical chemistry transcripts. As used herein, the term “expression profile' arrays, ball bead arrays, beads (e.g., Luminex), multiplex refers to the relative abundance of RNA, DNA abundances or PCR, quantitiative PCR, run-on assays, Northern blot analy activity levels. The expression profile can be a measurement sis, or even, for protein analysis, e.g., Western blot analysis, for example of the transcriptional state or the translational 2-D and 3-D gel protein expression, MALDI, MALDI-TOF, state by any number of methods and using any of a number of fluorescence activated cell sorting (FACS) (cell surface or gene-chips, gene arrays, beads, multiplex PCR, quantitiative intracellular), enzyme linked immunosorbent assays PCR, run-on assays, Northern blot analysis, or using RNA (ELISA), chemiluminescence studies, enzymatic assays, pro seq., nanostring, nanopore RNA sequencing etc. Apparatus liferation studies or any other method, apparatus and system and system for the determination and/or analysis of gene for the determination and/or analysis of gene expression that expression that are readily commercially available. are readily commercially available. 0036. As used herein the term “gene' is used to refer to a 0032. As used herein, the term “differentially expressed functional protein, polypeptide or peptide-encoding unit. As refers to the measurement of a cellular constituent (e.g., will be understood by those in the art, this functional term nucleic acid, protein, enzymatic activity and the like) that includes both genomic sequences, cDNA sequences, or frag varies in two or more samples, e.g., between a disease sample ments or combinations thereof, as well as gene products, and a normal sample. The cellular constituent may be on or including those that may have been altered by the hand of off (present or absent), upregulated relative to a reference or man. Purified genes, nucleic acids, protein and the like are downregulated relative to the reference. For use with gene used to refer to these entities when identified and separated chips or gene-arrays, differential gene expression of nucleic from at least one contaminating nucleic acid or protein with acids, e.g., mRNA or other RNAs (miRNA, siRNA, hnRNA, which it is ordinarily associated. rRNA, tRNA, etc.) may be used to distinguish between cell 0037. As used herein, the term “transcriptional state' of a types or nucleic acids. Most commonly, the measurement of sample includes the identities and relative abundances of the the transcriptional state of a cell is accomplished by quanti RNA species, especially mRNAs present in the sample. The tative reverse transcriptase (RT) and/or quantitative reverse entire transcriptional state of a sample, that is the combination transcriptase-polymerase chain reaction (RT-PCR), genomic of identity and abundance of RNA, is also referred to hereinas expression analysis, post-translational analysis, modifica the transcriptome. Generally, a substantial fraction of all the tions to genomic DNA, translocations, in situ hybridization relative constituents of the entire set of RNA species in the and the like. sample are measured. US 2015/03 15643 A1 Nov. 5, 2015

0038 Regarding the “expression level.” the group com expression analysis, post-translational analysis, modifica parison for a given disease provides the list of differentially tions to genomic DNA, translocations, in situ hybridization expressed transcripts. It was found that different diseases and the like. yield different Subsets of gene transcripts as demonstrated 0042. The skilled artisan will appreciate readily that herein. samples may be obtained from a variety of sources including, 0039 Gene expression monitoring systems for use with e.g., single cells, a collection of cells, tissue, cell culture and the present invention may include customized gene arrays the like. In certain cases, it may even be possible to isolate with a limited and/or basic number of genes that are specific sufficient RNA from cells found in, e.g., urine, blood, saliva, and/or customized for the one or more target diseases. Unlike tissue or biopsy samples and the like. In certain circum the general, pan-genome arrays that are in customary use, the stances, enough cells and/or RNA may be obtained from: present invention provides for not only the use of these gen mucosal secretion, feces, tears, blood plasma, peritoneal eral pan-arrays for retrospective gene and genome analysis fluid, interstitial fluid, intradural, cerebrospinal fluid, sweator other bodily fluids. The nucleic acid source, e.g., from tissue without the need to use a specific platform, but more impor or cell sources, may include a tissue biopsy sample, one or tantly, it provides for the development of customized arrays more sorted cell populations, cell culture, cell clones, trans that provide an optimal gene set for analysis without the need formed cells, biopies or a single cell. The tissue source may for the thousands of other, non-relevant genes. One distinct include, e.g., brain, liver, heart, kidney, lung, spleen, retina, advantage of the optimized arrays and gene sets of the present bone, neural, lymph node, endocrine gland, reproductive invention over the existing art is a reduction in the financial organ, blood, nerve, vascular tissue, and olfactory epithelium. costs (e.g., cost per assay, materials, equipment, time, person nel, training, etc.), and more importantly, the environmental 0043. The present invention includes the following basic cost of manufacturing pan-arrays where the vast majority of components, which may be used alone or in combination, the data is irrelevant. By eliminating the total number of genes namely, one or more data mining algorithms, one novel algo for analysis, it is possible to, e.g., eliminate the need to manu rithm specifically developed for this TB treatment monitor facture thousands of expensive platinum masks for photoli ing, the Temporal Molecular Response; the characterization thography during the manufacture of pan-genetic chips that of blood leukocyte transcriptional gene sets; the use of aggre provide vast amounts of irrelevant data. Using the present gated gene transcripts in multivariate analyses for the invention it is possible to completely avoid the need for molecular diagnostic/prognostic of human diseases; and/or microarrays if the limited probeset(s) of the present invention visualization of transcriptional gene set-level data and results. are used with, e.g., digital optical chemistry arrays, ball bead Using the present invention it is also possible to develop and arrays, multiplex PCR, quantitiative PCR, “RNA-seq for analyze composite transcriptional markers. The composite measuring mRNA levels using next-generation sequencing transcriptional markers for individual patients in the absence technologies, nanostring-type technologies or any other of control sample analysis may be further aggregated into a method, apparatus and system for the determination and/or reduced multivariate score. analysis of gene expression that are readily commercially 0044 An explosion in data acquisition rates has spurred available. the development of mining tools and algorithms for the 0040. The “molecular fingerprinting system” of the exploitation of microarray data and biomedical knowledge. present invention may be used to facilitate and conduct a Approaches aimed at uncovering the function of transcrip comparative analysis of expression in different cells or tis tional systems constitute promising methods for the identifi Sues, different Subpopulations of the same cells or tissues, cation of robust molecular signatures of disease. Indeed, Such different physiological states of the same cells or tissue, dif analyses can transform the perception of large-scale tran ferent developmental stages of the same cells or tissue, or Scriptional studies by taking the conceptualization of different cell populations of the same tissue against other microarray data past the level of individual genes or lists of diseases and/or normal cell controls. In some cases, the nor genes. mal or wild-type expression data may be from Samples ana 0045. The present inventors have recognized that current lyzed at or about the same time or it may be expression data microarray-based research is facing significant challenges obtained or culled from existing gene array expression data with the analysis of data that are notoriously “noisy.” that is, bases, e.g., public databases such as the NCBI Gene Expres data that is difficult to interpret and does not compare well sion Omnibus database. across laboratories and platforms. A widely accepted 0041 As used herein, the term “differentially expressed approach for the analysis of microarray data begins with the refers to the measurement of a cellular constituent (e.g., identification of subsets of genes differentially expressed nucleic acid, protein, enzymatic activity and the like) that between study groups. Next, the users try Subsequently to varies in two or more samples, e.g., between a disease sample “make sense' out of resulting gene lists using the novel Tem and a normal sample. The cellular constituent may be on or poral Molecular Response discovery algorithms and existing off (present or absent), upregulated relative to a reference or Scientific knowledge and by validating in independent sample downregulated relative to the reference. For use with gene sets and in different microarray analyses. chips or gene-arrays, differential gene expression of nucleic 0046 Pulmonary tuberculosis (PTB) is a major and acids, e.g., mRNA or other RNAs (miRNA, siRNA, hnRNA, increasing cause of morbidity and mortality worldwide rRNA, tRNA, etc.) may be used to distinguish between cell caused by Mycobacterium tuberculosis (M. tuberculosis). types or nucleic acids. Most commonly, the measurement of However, the majority of individuals infected with M. tuber the transcriptional state of a cell is accomplished by quanti culosis remain asymptomatic, retaining the infection in a tative reverse transcriptase (RT) and/or quantitative reverse latent form and it is thought that this latent state is maintained transcriptase-polymerase chain reaction (RT-PCR), genomic by an active immune response. Blood is the pipeline of the US 2015/03 15643 A1 Nov. 5, 2015

immune system, and as such is the ideal biologic material and the Training Set samples again showed the same cluster from which the health and immune status of an individual can ing pattern. This finding was verified in an independent be established. cohort, the Test Set, which likewise showed the TB and most 0047 Blood represents a reservoir and a migration com sarcoidosis patients clustered together while the pneumonia partment for cells of the innate and the adaptive immune and lung cancer patients also clustered together but separately systems, including neutrophils, dendritic cells and mono from the granulomatous diseases (FIG. 1). Clustering was not cytes, or B and T lymphocytes, respectively, which during influenced by ethnicity or gender (data not shown). infection will have been exposed to infectious agents in the 0051 FIG.1. The pulmonary granulomatous diseases, TB tissue. For this reason whole blood from infected individuals and sarcoidosis, display similar transcriptional signatures to provides an accessible source of clinically relevant material each other but distinct from pneumonia and lung cancer. where an unbiased molecular phenotype can be obtained 1446-transcripts were differentially expressed in the whole using gene expression microarrays for the study of cancer in blood of the Training Set healthy controls, pulmonary TB tissues autoimmunity), and inflammation, infectious disease, patients, pulmonary sarcoidosis patients, pneumonia patients or in blood or tissue. Microarray analyses of gene expression and lung cancer patients. The clustering of the 1446-tran in blood leucocytes have identified diagnostic and prognostic scripts were tested in an independent cohort from which they gene expression signatures, which have led to a better under were derived from, the Test Set. The heatmap shows the standing of mechanisms of disease onset and responses to transcripts and patients profiles as organised by the unbiased treatment. These microarray approaches have been attempted algorithm of unsupervised hierarchical clustering. A dotted for the study of active and latent TB but as yet have yielded line is added to the heatmap to help visualisation of the main Small numbers of differentially expressed genes only, and in clusters generated by the clustering algorithm. Transcript relatively small numbers of patients, therefore not reaching intensity values are normalised to the median of all tran statistical significance, which may not be robust enough to scripts. Red transcripts are relatively over-abundant and blue distinguish between other inflammatory and infectious dis transcripts under-abundant. The coloured bar at the bottom of eases. The present inventors recognized that a neutrophil the heatmap indicates which group the profile belongs to. driven blood transcriptional signature in active TB patients was missing in the majority of Latent TB individuals and in TABLE 1 healthy controls. For this description see, also, the study of Berry et al., 2010 (5), by the present inventors. This signature List of 1446 genes that differentiate between lung of active TB was reflective of lung radiographic disease and cancer, pneumonia, TB and sarcodiosis. was diminished after 2 months of treatment (5) and more Cancer vs Pneumonia vs Sarcoidosis Tb vs SEQ recently the present inventors have shown that the blood Symbol Control Control WS Control Control ID NO: transcriptional signature of TB was diminished as early as 2 TMEM144 UP UP UP UP 1 weeks after commencement of treatment (12). The signature FBLNS DOWN DOWN DOWN DOWN 2 was dominated by interferon-inducible genes, and at a modu FBLNS DOWN DOWN DOWN DOWN 3 ERI1 UP UP UP UP 4 lar level the active TB signature (5, 12) was distinct from CXCR3 DOWN DOWN DOWN DOWN 5 other infectious or autoimmune diseases (5). GLUL UP UP UP UP 6 0048. In the present findings and the basis of this applica LOC728728 UP UP UP UP 7 KLHDC8B UP UP UP UP 8 tion the blood transcriptional profiles of the pulmonary KCNJ15 UP UP UP UP 9 granulomatous diseases (TB and sarcoidosis) clustered RNF125 DOWN DOWN DOWN DOWN 10 together but distinctly from the similar pulmonary diseases CCNB1IP1 DOWN DOWN DOWN DOWN 11 pneumonia and lung cancer. PSG9 UP UP UP UP 12 LOC100170939 UP UP UP UP 13 0049. It has previously been shown that TB and sarcoido QPCT UP UP UP UP 14 sis have similar transcriptional profiles however no published CD177 UP UP UP UP 15 studies have determined if this similar blood gene expression LOC4OO499 UP UP UP UP 16 profile is due to generalized transcriptional activity associ LOC4OO499 UP UP UP UP 17 LOC10O134634 UP UP UP UP 18 ated with pulmonary diseases or due to specific host TMEM88 UP UP UP UP 19 responses associated with TB and sarcoidosis. Therefore, we LOC729028 UP UP DOWN UP 2O recruited three cohorts of TB and sarcoidosis patients (Train EPSTI1 UP UP UP UP 21 NSC UP UP UP UP 22 ing, Test and Validation Sets) alongside patients with similar LOC7284.84 DOWN DOWN DOWN DOWN 23 pulmonary diseases community acquired pneumonia and ERP27 DOWN UP DOWN DOWN 24 lung cancer. On average the sarcoidosis patients presented CCDC109A UP UP UP UP 25 with a milder and more chronic presentation than the TB and LOC72958O UP UP UP UP 26 C2 DOWN UP UP UP 27 pneumonia patients. There was little difference in the demo TTRAP UP UP DOWN UP 28 graphics and clinical characteristics of the participants in the ALPL UP UP DOWN UP 29 Training and Test Sets. MAEA UP UP UP UP 30 0050. Unbiased analysis followed by unsupervised hierar COX10 DOWN DOWN DOWN DOWN 31 GPR84 UP UP UP UP 32 chical clustering of the blood transcriptional profiles from all PHF2OL1 UP UP UP UP 33 the Training Set participants clearly demonstrated that the TB TRMT11 DOWN DOWN DOWN DOWN 34 and sarcoidosis patients transcriptional profiles clustered ANKRD22 UP UP UP UP 35 together but distinctly from the pneumonia and cancer MATK DOWN DOWN DOWN DOWN 36 TBC1D24 UP UP UP UP 37 patients transcriptional profiles which themselves clustered LILRAS UP UP UP UP 38 together (3422 transcripts). Adding a statistical filter gener TMEM176B UP UP UP UP 39 ated 1446 differentially expressed transcripts. Applying CAMP UP UP UP UP 40 unsupervised hierarchical clustering of the 1446-transcripts US 2015/03 15643 A1 Nov. 5, 2015

TABLE 1-continued TABLE 1-continued List of 1446 genes that differentiate between lung List of 1446 genes that differentiate between lung cancer, pneumonia. TB and Sarcodiosis. cancer, pneumonia. TB and Sarcodiosis. Pneumonia vs Sarcoidosis Tb vs SEQ Cancer ws Pneumonia vs Sarcoidosis Tb vs SEQ Symbol Control vs Control Control ID NO: Symbol Control Control vs Control Control ID NO:

PKIA DOW N DOW N DOWN 41 LOC399715 C DOWN 12 PFTK1 42 LOC642161 OWN DOWN 13 TPM2 O O 43 LOC100129541 C 14 TPM2 44 TCTN1 OWN 15 PRKCQ 45 SLAMF8 C 16 PSTPIP2 46 TGM2 OWN 17 LOC1296O7 47 ECE1 C 18 APRT 48 19 WAMPS 49 2O FCGR1C 50 WN 21 SHKBP1 51 DPM2 22 CD79B 52 CR1 C 23 SIGIRR W W 53 CR1 24 FKBP9L. S4 TAPBP WN 25 LOC729660 55 PPAP2C OWN 26 WDR74 56 MBOAT2 27 LOC646434 57 MS4A2 28 LOC647834 58 FAM176B 29 RECK 59 LOC3901.83 O 30 MGST1 60 RPLP1 31 PIWIL4 61 SERPING1 C 32 LILRB1 62 LOC441743 33 FCGR1B 63 H1FO 34 NOC3L DOWN DOWN DOWN 64 SOD2 35 ZNF83 DOWN DOWN DOWN 65 LOC642828 WN 36 FCGBP DOWN DOWN DOWN 66 POLB 37 SNORD13 DOWN DOWN DOWN 67 TSPAN9 38 LOC642267 UP UP UP 68 ORMDL3 39 UP UP UP 69 FER1L3 40 GBP5 UP UP UP 70 LBH E. W N 41 EOMES DOWN DOWN DOWN DOWN 71 PNKD 42 BST1 UP UP UP 72 SLPI 43 UP UP UP 73 SIRPB1 44 CHMP7 DOWN DOWN DOWN DOWN 74 LOC389386 45 ETV 7 UP UP UP UP 75 REC8 46 LOC400304 DOWN DOWN DOWN DOWN 76 GNLY 47 LVBL DOWN DOWN DOWN DOWN 77 GNLY WN WN WN 48 LOC728262 UP UP UP UP 78 FOLR3 49 GNLY DOWN DOWN DOWN DOWN 79 LOC730286 50 LOC388572 UP UP UP UP 8O SKAP1 W N W N 51 GATA1 DOWN DOWN UP UP 81 SELP C 52 MYBL1 DOWN DOWN DOWN DOWN 82 DHX30 W N 53 SELM DOWN DOWN DOWN DOWN 83 KIAA1618 S4 LOC441124 UP UP UP UP 84 NQO2 55 LOC441124 UP UP UP UP 85 SF1 56 L12RB1 DOWN DOWN UP UP 86 ANKRD46 WN 57 DOWN DOWN DOWN DOWN 87 LOC646,301 58 BRIX1 DOWN DOWN DOWN DOWN 88 LOC400464 W N OWN 59 GAS6 OWN UP UP UP 89 LOC100134.703 60 GAS6 C UP UP UP 90 C2OORF106 61 LOC10O133740 C UP UP UP 91 ZNF683 62 GPSM1 DOWN DOWN DOWN 92 SLC25A38 63 UP UP UP 93 YPEL1 64 DOWN DOWN DOWN 94 IL1R1 65 C UP UP UP 95 EPHA 66 ER3 UP UP UP 96 CHD6 67 MAPK14 UP UP UP 97 LIMK2 68 PROK1 UP UP UP 98 LOC643733 69 GPR109B UP UP UP 99 LOC441SSO 70 SASP UP UP UP 1OO MGC3O20 s N 71 LOC728.093 UP UP UP 101 ANKRD9 72 PROK2 UP DOWN UP 102 NOD2 73 CTSW DOWN DOWN DOWN 103 N 74 ABHD2 UP UP UP 104 MCTP1 ow 75 LOC10O130775 DOWN DOWN DOWN 105 BANK1 76 SLITRK4 UP UP UP 106 ZNF30 yN 77 FBXW2 UP UP UP 107 CTTN 78 RTTN DOWN DOWN DOWN 108 PTCRA 79 TAF15 UP DOWN DOWN 109 FBXO7 8O FUTT C U U U 110 FBXO7 C 81 DUSP3 C U U U 111 ABLIM1 OWN 82 US 2015/03 15643 A1 Nov. 5, 2015

TABLE 1-continued TABLE 1-continued List of 1446 genes that differentiate between lung List of 1446 genes that differentiate between lung cancer, pneumonia. TB and Sarcodiosis. cancer, pneumonia. TB and Sarcodiosis. Cancer vs Pneumonia vs Sarcoidosis Tb vs SEQ Cancer ws Pneumonia vs Sarcoidosis Tb vs SEQ Symbol Control Control vs Control Control ID NO: Symbol Control Control vs Control Control ID NO:

LAMP3 DOWN C 83 TYW3 DOWN DOWN DOWN DOWN 2S4 CEBPE UP C 84 BTLA DOWN DOWN DOWN DOWN 255 LOC646909 DOWN 85 SLC24A4 UP UP UP UP 2S6 BCL11B DOWN 86 DOWN DOWN DOWN DOWN 257 TRIMS8 DOWN 8 87 NCALD DOWN DOWN DOWN DOWN 258 SAMD3 DOWN 88 ORAI2 UP UP UP UP 259 SAMD3 DOWN 89 TGB3BP DOWN DOWN DOWN DOWN 260 MYOF UP 90 GYPE UP UP UP UP 261 TTPAL UP 91 DOCKS UP UP UP UP 262 LOC642934 DOWN 92 RASGRP4 UP UP 263 UP 93 LOC339290 DOWN DOWN 264 SNORA28 UP 94 PRF1 DOWN DOWN 26S FLJ32255 UP 95 TGFBR3 DOWN DOWN 266 DOWN 96 LGALS9 UP C 267 LOC642O73 DOWN 97 LGALS9 UP C 268 CAMKK2 UP 98 BATF2 UP 269 OAS2 UP 99 MGCS7346 DOWN 270 RASGRP1 DOWN 2OO TXK WN DOWN WN 271 CAPG UP 2O1 DHX58 DOWN 272 LOC648343 DOWN 2O2 EPB41L3 C 273 CETP UP 2O3 LOC10O132499 OW N 274 CETP UP 204 OC100129674 C 275 CXCR7 DOWN 205 DPD5 W N 276 UBASH3A DOWN 2O6 CP2 277 LOC28.4648 DOWN 2O7 3AR1 278 L1R2 UP C 208 POB48R 279 AGK DOWN 209 TRN 28O GTPBP8 DOWN 210 LC2A14 WW NN 281 LEF1 DOWN 211 LEC4D 282 LEF1 DOWN 212 KM2 283 GPR109A UP 213 DCAS 284 FI35 UP 214 ACNA1E 285 UP 215 SBPL3 W N 286 UP 216 S.LC22A15 WW NN 287 SP4 DOWN DOWN DOWN 217 VPREB3 W N W N 288 L2RB DOWN DOWN DOWN 218 LOC642780 C C C 289 ABLIM1 OWN DOWN DOWN 219 MEGF6 290 TAPBP C C 220 WN WN 291 MAL OWN OWN 221 DOWN DOWN 292 TCEA3 WN OWN OWN WN 222 DOWN DOWN 293 C C C :C KREMEN1 223 DOWN DOWN 294 KREMEN1 224 C C C 295 VNN1 C 225 OWN 296 GBP1 226 WN y N OWN y N 297 GBP1 WN 227 C C C 298 UBE2C C 228 299 DET1 W 229 PGRIP1 W N W N 3OO ANKRD36 WN N W N WN 230 PR160 301 DEFA4 C 231 MTC1 3O2 GCH1 232 BCA2 WN 303 L7R WN WW N WN 233 EACAM1 3O4 TMCO3 WN 234 EACAM1 305 FBXO6 235 LJ42957 306 LACTB 236 AH2 307 LOC730953 237 DAH2 3O8 LOC285296 238 3ORF18 WN N 309 L18R1 239 TAGLN 310 C C C 240 LCN2 311 PRR5 DOWN C 241 RELB 312 LOC4OOO61 DOWN DOWN 242 NR12 313 TSEN2 DOWN DOWN 243 BEND7 314 MGC15763 : DOWN DOWN : 244 W N W N 315 SH3YL1 DOWN DOWN DOWN DOWN 245 9 OWN 316 ZNF337 DOWN DOWN DOWN DOWN 246 DUT DOWN DOWN 317 AFF3 DOWN DOWN DOWN DOWN 247 SETD6 DOWN DOWN 318 TYMS UP UP UP UP 248 DOWN DOWN 319 ZCCHC14 DOWN DOWN DOWN DOWN 249 LOC100131572 DOWN DOWN 32O SLC6A12 UP UP UP UP 250 TNRC6A DOWN DOWN 321 LY6E DOWN UP UP UP 251 LOC399744 UP UP 322 KLF12 DOWN DOWN DOWN DOWN 252 MAPK13 R UP UP 323 LOC10O132317 UP UP UP UP 253 TAP2 U P UP UP 324

US 2015/03 15643 A1 Nov. 5, 2015

TABLE 1-continued TABLE 1-continued List of 1446 genes that differentiate between lung List of 1446 genes that differentiate between lung cancer, pneumonia. TB and Sarcodiosis. cancer, pneumonia. TB and Sarcodiosis. Cancer vs Pneumonia vs Sarcoidosis Tb vs SEQ Cancer ws Pneumonia vs Sarcoidosis Tb vs SEQ Symbol Control Control vs Control Control ID NO: Symbol Control Control vs Control Control ID NO:

UPRT DOWN DOWN DOWN 608 OAS3 DOWN C UP UP 679 UP UP DOWN 609 PRRS DOWN OW UP DOWN 68O PLEKHA1 DOWN DOWN DOWN 610 TMEM194 DOWN OW DOWN DOWN 681 GIMAP7 DOWN DOWN DOWN 611 MS4A1 DOWN OW S DOWN DOWN 682 CACNA2D3 DOWN DOWN DOWN 612 NRSN2 UP C UP UP 683 DDX10 DOWN DOWN DOWN 613 MTHFD2 UP C UP UP 684 RPL23A DOWN DOWN DOWN 614 LOC400793 UP C DOWN UP 685 C2ORF44 DOWN DOWN DOWN 615 CEACAM1 UP C UP UP 686 LSP UP C 616 RPL37 DOWN OWN DOWN DOWN 687 C7ORF53 UP 617 APP UP C DOWN DOWN 688 LOC10O130905 618 RRBP1 UP C UP UP 689 DNAJCS 619 SLCO4C1 UP C DOWN DOWN 690 SLAIN1 6W N 62O XAF1 DOWN OW N UP UP 691 DKN1C C 621 XAF1 DOWN C UP UP 692 KAP7 W N 622 SLC2A6 DOWN C UP UP 693 ATL1 623 ZNF831 DOWN OW DOWN DOWN 694 RELD1 624 ZNF831 DOWN OW DOWN DOWN 695 NHIT6 WN 625 POLR1C DOWN O DOWN DOWN 696 626 GLT1D1 UP C UP UP 697 627 WDR UP C UP UP 698 WN 628 FITS UP C UP UP 699 M A. 629 CSTA UP C UP UP 700 s 630 SNEHG8 DOWN OWN DOWN DOWN 701 D KSRA P2 631 TOP1MT DOWN WN DOWN DOWN 702 { 632 UPP1 UP C UP UP 703 WN WN 633 SYTL2 DOWN DOWN DOWN DOWN 704 E3 634 LOC440359 DOWN UP UP 705 635 KLRB1 DOWN DOWN DOWN 706 i 636 MTMR3 C UP UP 707 637 S1PR1 OWN DOWN DOWN 708 KIR2DL3 yN 638 FYB C UP UP 709 C19ORF59 g 639 CDC2O C UP UP 710 NRG1 640 MEX3C DOWN DOWN 711 PPP2R2B 641 FAM168B WN DOWN DOWN 712 CDK5RAP2 642 C2OORF107 C UP UP 713 PLSCR1 643 SLC4A7 DOWN DOWN 71.4 UBL7 644 CD79B DOWN DOWN 715 HES4 645 FAM84B DOWN DOWN 716 ZNF256 646 LOC10O134688 UP UP 717 DKFZP761E198 647 LOC651738 UP UP 718 SAMD14 648 PLAGL1 UP UP 719 BAG3 649 TIMM10 UP UP 720 PARP14 6SO LOC641710 UP UP 721 MS4A7 651 TRAFS DOWN DOWN 722 ECHDC3 652 TAP1 UP UP 723 OCLAD2 653 FCRL2 DOWN DOWN 724 LOC90925 WN 654 SRC UP UP 725 RGLA 655 RALGAPA1 726 PARP9 656 OCIAD2 WN WN 727 PARP9 657 PON2 DOWN DOWN 728 CD151 658 LOC73OO29 DOWN DOWN 729 SAAL1 659 LOC10O134768 C 730 LOC388076 WN 660 LOC100134241 OW N 731 SIGLECS 661 LOC26010 WN 732 LRIG1 662 PLA2G12A 733 PTGDR 663 BACH1 734 PTGDR 664 DSC1 735 NBPF8 665 NOB1 DOWN 736 NHS 666 LOC645693 DOWN 737 ACSL1 667 LOC643313 DOWN 738 668 BTBD11 DOWN 739 SNX2O 669 TMEM169 UP 740 F2RL1 670 REPS2 UP 741 F2RL1 671 ZNF23 DOWN 742 PARP12 DOWN 672 C18ORF55 WN DOWN WN 743 LOC441506 DOWN DOWN DOWN 673 APOL2 UP 744 MFGE8 DOWN DOWN DOWN 674 APOL2 UP 745 SERPINA10 DOWN DOWN DOWN 675 PASK W N DOWN 746 FAM69A DOWN DOWN DOWN 676 FER1L3 UP 747 L4R UP DOWN UP 677 U2AF1 DOWN DOWN 748 KIAA1671 DOWN DOWN DOWN 678 LOC285359 DOWN DOWN 749 SIGLEC14 UP DOWN 750 US 2015/03 15643 A1 Nov. 5, 2015

TABLE 1-continued TABLE 1-continued List of 1446 genes that differentiate between lung List of 1446 genes that differentiate between lung cancer, pneumonia. TB and Sarcodiosis. cancer, pneumonia. TB and Sarcodiosis. Cancer vs Pneumonia vs Sarcoidosis Tb vs SEQ Cancer ws Pneumonia vs Sarcoidosis Tb vs SEQ Symbol Control Control vs Control Control ID NO: Symbol Control Control vs Control Control ID NO:

ARL1 DOWN DOWN DOWN DOWN 751 TGB3 UP UP UP UP 822 C19ORF62 DOWN DOWN UP DOWN 752 DHRS9 UP UP UP UP 823 NCR3 DOWN DOWN DOWN DOWN 753 PLEKHF1 DOWN DOWN DOWN DOWN 824 U C UP UP UP 754 UP UP UP UP 825 HOXB2 DOWN DOWN DOWN DOWN 755 DOWN UP UP UP 826 RNF135 UP UP UP UP 756 UP UP UP UP 827 FIT1 UP UP UP UP 757 UP UP DOWN UP 828 GCAT UP DOWN UP UP 758 UP UP UP UP 829 KLF12 DOWN DOWN DOWN DOWN 759 DOWN UP UP UP 830 LILRB2 DOWN UP UP UP 760 DOWN DOWN DOWN DOWN 831 LOC72883S DOWN DOWN DOWN DOWN 761 DOWN DOWN DOWN DOWN 832 GSN UP UP UP UP 762 DOWN DOWN DOWN DOWN 833 LOC10OOO8589 UP DOWN DOWN UP 763 DOWN DOWN 834 LOC10OOO8589 UP UP DOWN UP 764 WN DOWN WN DOWN 835 FLJ14213 DOWN DOWN UP UP 765 836 SH2D3C UP UP UP UP 766 837 LOC100.133177 UP UP UP UP 767 C C 838 TMEM176A UP UP UP UP 768 839 HIST2H2AB UP UP UP UP 769 WN WN 840 KIAA1618 UP UP UP UP 770 C C 841 CMTMS UP UP UP UP 771 842 C21 ORF2 DOWN DOWN DOWN DOWN 772 WN WN 843 CREBS UP UP UP UP 773 C C 84.4 FAS UP UP UP UP 774 C C 845 MTF1 UP UP UP UP 775 846 RSAD2 UP UP UP UP 776 847 ANPEP UP UP UP UP 777 SS 848 C14ORF179 DOWN DOWN DOWN DOWN 778 849 TXNL4B UP UP UP UP 779 HLA-DRB3 8SO MYL9 UP UP UP UP 78O SESN1 851 MYL9 UP UP UP UP 781 LOC34.7376 852 LOC10O130828 UP UP UP UP 782 P2RY14 853 LOC391019 DOWN DOWN DOWN DOWN 783 P2RY14 854 TGA2B UP UP UP UP 784 P2RY14 855 KLRC3 DOWN DOWN DOWN DOWN 785 856 RASGRP2 DOWN DOWN DOWN DOWN 786 857 NDST1 UP UP UP UP 787 858 LOC388344 DOWN DOWN DOWN DOWN 788 859 FI6 DOWN UP UP UP 789 860 OAS1 UP UP UP UP 790 C 861 OAS1 UP UP UP UP 791 862 TRIM10 DOWN DOWN UP DOWN 792 863 LIMK2 UP UP UP UP 793 864 LIMK2 UP UP UP UP 794 RBBP8 865 ATP5S DOWN DOWN DOWN DOWN 795 LOC6543SO 866 SMARCD3 UP UP UP UP 796 SLC30A1 867 PHC2 UP UP UP UP 797 PRSS23 868 SOX8 DOWN DOWN DOWN DOWN 798 AM3 869 LCK DOWN DOWN DOWN DOWN 799 870 DOWN DOWN DOWN DOWN 800 871 SAMD9L. UP UP UP UP 8O1 872 EHBP1 DOWN DOWN DOWN DOWN 802 LOC642788 873 E2F2 DOWN DOWN UP DOWN 803 ALPK1 874 CEACAM6 UP UP UP UP 804 LOC439949 875 LOC10O132394 UP DOWN DOWN UP 805 876 LOC728O14 DOWN DOWN DOWN DOWN 806 877 LOC728O14 DOWN DOWN DOWN DOWN 807 878 SIRPG DOWN DOWN DOWN DOWN 808 879 OPLAH UP UP UP UP 809 880 FTHL2 UP UP UP UP 810 881 CXORF21 UP UP UP UP 811 882 CACNG6 DOWN DOWN UP DOWN 812 883 C11 ORF75 UP UP UP UP 813 884 LY9 DOWN DOWN DOWN DOWN 814 885 LILRB4 UP UP UP UP 815 886 STAT2 UP UP UP UP 816 LOC441193 887 RAB20 UP UP UP UP 817 LOC2O2134 888 SOCS1 DOWN UP UP UP 818 KIAAO319L. 889 PLOD2 UP UP UP UP 819 890 UGDH DOWN DOWN DOWN DOWN 82O 891 MAK16 DOWN DOWN DOWN DOWN 821 892 US 2015/03 15643 A1 Nov. 5, 2015

TABLE 1-continued TABLE 1-continued List of 1446 genes that differentiate between lung List of 1446 genes that differentiate between lung cancer, pneumonia. TB and Sarcodiosis. cancer, pneumonia. TB and Sarcodiosis. Cancer vs Pneumonia vs Sarcoidosis Tb vs SEQ Cancer ws Pneumonia vs Sarcoidosis Tb vs SEQ Symbol Control Control vs Control Control ID NO: Symbol Control Control vs Control Control ID NO:

GNB4 UP UP UP 893 CASP7 UP C 964 ANKRD22 UP UP UP 894 ZDHHC19 UP C 96S PROS1 UP UP UP 895 LOC732371 UP C 966 CD4OLG DOWN DOWN DOWN 896 DENND1A UP C 967 RIOK2 WN DOWN DOWN DOWN 897 EMR2 UP C 968 AFF1 UP UP UP 898 LOC6433O8 DOWN OW 969 HIST1H3D UP UP UP 899 ADA DOWN OW 970 SLC26A8 UP UP UP 900 LOC646527 DOWN yS WS 971 SLC26A8 UP UP UP 901 LOC643313 UP C 972 RNASE3 UP UP UP 902 GZMB DOWN 973 UBE2IL6 UP UP UP 903 OLIG2 DOWN WN 974 UBE2IL6 UP UP UP 904 GRINA DOWN 975 SSH1 UP DOWN UP 905 HLA-DPB1 DOWN 976 KRBA1 DOWN DOWN DOWN 906 MX1 977 SLC25A23 DOWN DOWN DOWN 907 THOC3 WN OW N WN 978 DTX3L UP UP UP 908 CHST13 C N 979 DOK3 UP UP UP 909 TRPM6 98O LOC644615 UP UP UP 910 981 SULT1B1 UP DOWN UP 911 JAK2 982 RASGRP4 UP UP UP 912 ARHGEF11 983 ALOX1SB UP UP UP 913 ARHGEF11 984 ADM UP UP UP 914 HOMER2 985 LOC391.825 DOWN DOWN DOWN 915 TACSTD2 986 LOC73O234 UP UP UP 916 CA4 987 HIST2H2AA3 UP UP UP 917 GAA 988 HIST2H2AA3 UP UP UP 918 IFITM3 989 LIMK2 UP UP UP 919 CLYBL 990 MMRN1 UP UP UP 920 CLYBL WN W 991 PADI2 UP DOWN UP 921 ANGPT1 S 992 FKBP1A UP UP UP 922 MME 993 GYG1 UP UP UP 923 ZNF408 994 UP DOWN UP 924 STAT1 995 DOWN DOWN DOWN 925 STAT1 996 DOWN DOWN DOWN 926 PNPLA7 OW N 997 D 3 DOWN DOWN DOWN 927 NDO 998 UP UP UP 928 PDZD8 999 DOWN DOWN DOWN 929 PDGFD W N OOO UP DOWN UP 930 CTSL1 : OO1 UP UP UP 931 HOMER3 C OO2 UP DOWN DOWN 932 CEP78 DOWN OO3 DOWN UP DOWN 933 SBK1 DOWN OO4 UP UP UP 934 ALG9 DOWN 005 UP UP UP 935 KIF27 DOWN OO6 LILRA6 UP UP UP 936 L1R2 UP OO7 SPTLC2 UP UP UP 937 RAB4OB DOWN OO8 CDA UP UP UP 938 MMP23B DOWN O09 PGD UP UP UP 939 UP O10 LOC10O130769 DOWN UP UP 940 PGLYRP1 UP O11 ECHDC2 DOWN DOWN DOWN 941 UHRF1 UP O12 KIF2OB DOWN DOWN DOWN 942 UP O13 B3GNT8 UP UP UP 943 UP O14 PYHIN1 DOWN DOWN DOWN 944 UP O15 LBH DOWN DOWN DOWN 945 OWN DOWN O16 LBH DOWN DOWN DOWN 946 DOWN yN O17 UP UP UP 947 HE N DOWN DOWN O18 BPI UP UP UP 948 MG DOWN DOWN O19 GAR1 DOWN DOWN DOWN 949 UAP1 DOWN DOWN O2O ST3GAL4 UP DOWN UP 950 OC390735 DOWN DOWN O21 TMEM19 DOWN DOWN DOWN 951 OC641849 DOWN DOWN O22 DHRS12 UP UP UP 952 MP UP UP O23 DHRS12 UP UP UP 953 UP UP UP 954 FA1B UP UP O24 FAM26F UP UP UP 955 FA1B UP UP O25 FCRLA DOWN DOWN DOWN 956 FA1B UP UP O26 OSBPL7 DOWN DOWN DOWN 957 C S 2 UP UP O27 CTSB DOWN UP UP 958 PS2 UP UP O28 ALDH1A1 DOWN UP UP 959 F550 DOWN DOWN DOWN DOWN O29 SRRD DOWN UP DOWN 96.O BPL1A UP UP DOWN DOWN O3O TOLLIP UP UP UP 961 ORF1 DOWN DOWN DOWN DOWN O31 ICAM1 UP UP UP 962 MCTP2 UP UP UP UP O32 LAX1 OWN DOWN DOWN DOWN 963 EMR4 DOWN DOWN UP UP O33 US 2015/03 15643 A1 Nov. 5, 2015 19

TABLE 1-continued TABLE 1-continued List of 1446 genes that differentiate between lung List of 1446 genes that differentiate between lung cancer, pneumonia. TB and Sarcodiosis. cancer, pneumonia. TB and Sarcodiosis. Cancer vs Pneumonia vs Sarcoidosis Tb vs SEQ Cancer ws Pneumonia vs Sarcoidosis Tb vs SE Symbol Control Control vs Control Control ID NO: Symbol Control Control vs Control Control ID NO:

LOC653.316 DOWN DOWN DOWN DOWN TGB5 UP UP UP 05 UP UP UP UP ZNF516 UP UP UP O6 FCRL6 DOWN DOWN DOWN DOWN ARHGAP26 UP UP UP O7 MRPS26 DOWN DOWN DOWN DOWN TIMP2 UP UP UP O8 RHOBTB3 DOWN DOWN UP UP FCGR1A UP UP UP 09 DIRC2 UP UP UP UP RHOH DOWN DOWN DOWN 10 CD27 DOWN DOWN DOWN DOWN FI44 UP UP UP 11 PLEKHG4 DOWN DOWN DOWN DOWN MTX3 DOWN DOWN DOWN 12 CDH6 UP UP UP CD74 DOWN UP UP 13 C4ORF23 UP UP UP LCK DOWN DOWN DOWN 14 HIST2H2AC UP UP UP TLR4 UP UP UP 15 SLC7A6 DOWN DOWN DOWN DOWN DOWN DOWN 16 SLC7A6 DOWN DOWN DOWN DSC2 UP UP UP 17 SLAMF6 DOWN DOWN DOWN CXORF45 DOWN DOWN DOWN 18 RETN UP UP DOWN ENPP4 DOWN DOWN DOWN 19 FAIM3 DOWN DOWN DOWN CD3OOC UP UP UP 2O PIK3C2A DOWN DOWN DOWN OASL UP UP UP 21 TMEM99 DOWN DOWN DOWN HPSE UP UP UP 22 LOC7284.11 DOWN DOWN DOWN MTHFD2 UP UP UP 23 TMEM194A DOWN DOWN DOWN GSTM2 DOWN DOWN DOWN 24 NAPEPLD DOWN DOWN DOWN OLFM4 UP UP UP 25 ACOX1 UP C ABHD12B UP UP UP 26 CTLA4 DOWN OW N OW N LOCA28417 UP UP UP 27 SCO2 UP C LOCA28417 UP UP UP 28 STK3 UP FCAR UP UP UP 29 FLT3LG DOWN OW N GTPBP3 DOWN DOWN DOWN 30 WASP UP KLF DOWN UP UP 31 FBXO31 DOWN HOPX DOWN DOWN DOWN 32 TDRD9 W THEBD UP DOWN UP 33 TDRD9 HIST1H2BG UP DOWN UP 34 LOC646144 LOC730995 DOWN 35 NUSAP1 OPN3 DOWN WN WN 36 GPR97 NOP56 DOWN DOWN DOWN 37 GPR97 DOWN DOWN DOWN 38 GPR97 NLRC3 DOWN DOWN DOWN 39 EMR1 LOC10O134083 UP UP UP 40 NR1EH3 COP1 UP UP UP 41 SLAMF6 CARD16 UP UP UP 42 CCDC106 yN SP140 UP UP UP 43 ODF3B CD96 DOWN DOWN DOWN 44 LOC100129904 DOWN UP DOWN 45 PADI4 POLD2 DOWN DOWN 46 LOC10O132858 L32 DOWN DOWN 47 PIK3AP1 LOC728.744 UP UP 48 ZNF792 FZD2 UP UP 49 DIP2A WN ZAP70 DOWN DOWN 50 OSCAR PYHIN1 DOWN DOWN 51 SCARF1 UP UP 52 CLIC3 FI27 UP UP 53 EANCE s S PFKFB2 UP UP S4 TECPR2 C PAM DOWN DOWN 55 P2RY1O W N WARS UP UP 56 ADORA3 DOWN DOWN 57 L18RAP W N TCN1 UP UP 58 DEFA3 LOC649839 DOWN DOWN 59 BRSK1 MMP9 UP UP 60 LOC647691 RIN3 UP UP 61 ALG8 TMEM194A DOWN DOWN 62 S1PR5 WN TAP2 UP UP 63 CPA3 C17ORF87 UP UP 64 BMX LOC7286SO UP UP 65 DDX58 PNMA3 DOWN DOWN 66 RHOBTB1 CPT1B UP UP 67 TNFRSF25 W N LTBP3 DOWN DOWN 68 LOC73O387 CCDC34 UP DOWN 69 OLR1 PRAGMIN DOWN DOWN 70 HERCS C9CORF91 UP UP 71 STAT1 SMPDL3A UP UP 72 NELF GPRS6 DOWN DOWN 73 STAP1 yN C14ORF147 UP UP 74 SLC2AS C SMARCD3 UP UP 75 US 2015/03 15643 A1 Nov. 5, 2015 20

TABLE 1-continued TABLE 1-continued List of 1446 genes that differentiate between lung List of 1446 genes that differentiate between lung cancer, pneumonia. TB and Sarcodiosis. cancer, pneumonia. TB and Sarcodiosis. Cancer vs Pneumonia vs Sarcoidosis Tb vs SE Cancer ws Pneumonia vs Sarcoidosis Tb vs SEQ Symbol Control Control vs Control Control ID NO: Symbol Control Control vs Control Control ID NO:

FAM119A OWN DOW N DOW N OWN 76 AGTRAP UP UP UP 247 LOC642334 C 77 LOC646786 UP UP DOWN 248 ENOSF1 OW N 78 NCALD DOWN DOWN DOWN 249 FAR2 79 TTC25 DOWN DOWN DOWN 250 LOC441763 6W N 8O LOC646966 DOWN DOWN DOWN 251 TESC WW NN 81 TSPANS DOWN DOWN 252 CECR6 82 ZNF559 WN DOWN DOWN 253 KIAA1598 83 NFKB2 C UP 2S4 84 LOC652616 C UP 255 GPR109B 85 HLA-DOA DOWN 2S6 LRRN3 86 WARS UP 257 RNF213 WN yN 87 GBP2 UP 258 LRP3 88 AUTS2 OWN DOWN 259 ASGR2 89 GF2BP3 UP 260 ASGR2 90 OASL UP 261 ZSCAN18 91 DYSF UP 262 MCOLN2 WN 92 FLJ43093 DOWN 263 IFIT2 C 93 FAM159A DOWN 264 PLCEH2 94 MS4A14 DOWN 26S MAP7 95 TGFB11 266 GBP4. s N 96 RADS1C OW N 267 MGMT DOWN DOWN 97 CALD1 268 GAL3ST4 DOWN 98 LOC441073 269 C2ORF89 DOWN 99 CCNC 270 TXNDC3 UP 200 LOC730281 271 IFIH1 UP 2O1 MUC1 272 PRRG4 UP 2O2 C14ORF124 273 LOC641693 UP 2O3 RPL14 274 LOCA28093 UP 204 APOL6 275 TNFAIP8L1 DOWN DOWN 205 276 AP3M2 DOWN DOWN DOWN 2O6 KCTD12 277 BACH2 DOWN DOWN DOWN 2O7 TGAX 278 BACH2 DOWN DOWN DOWN 208 FIT3 279 C9CORF123 DOWN DOWN DOWN 209 LPCAT2 28O CACNA1I DOWN DOWN DOWN 210 ZNF529 281 LOC10O132287 UP UP UP 211 MRPL9 282 CAMK1D UP UP UP 212 AGTRAP 283 ANKRD33 UP UP UP 213 LOC4O2112 284 CCR6 DOWN DOWN DOWN DOWN 214 LOC10O134822 C C 285 ALDH1A1 DOWN DOWN UP 215 SH2D1B W N 286 LOC10O132797 DOWN UP DOWN DOWN 216 MPO 287 CD163 UP UP UP 217 LOC100131967 288 ESAM UP UP UP 218 LOC440459 289 FCAR UP UP UP 219 FAM44B 290 TCN2 UP UP UP 220 ACOT9 291 LOC10O1292O3 DOWN DOWN DOWN 221 SLC37A1 292 CD6 DOWN DOWN DOWN DOWN 222 LOC72991S 293 B3GNT1 DOWN DOWN DOWN 223 PDZK1P1 294 NEK8 DOWN DOWN DOWN 224 S100A12 295 SLC38AS UP UP UP 225 RAB3IL1 296 CD3E DOWN DOWN DOWN 226 TMEM204 297 DOWN DOWN DOWN 227 CXCL10 298 GPR183 DOWN DOWN DOWN 228 TSR1 299 CCDC76 DOWN DOWN DOWN 229 NSUNS 3OO MS4A1 DOWN DOWN DOWN 230 MXD3 301 IFIT1 DOWN C 231 LILRAS 3O2 MED13L UP OW N OWN 232 CKAP4 303 SLC26A8 UP 233 C6ORF190 3O4 NOV DOWN W N 234 ECGF1 305 FL2OO3S DOWN 235 LDLRAP1 306 UGT1A3 UP 236 GRB10 307 LOC6S3600 UP 237 FCRL3 3O8 LOC642684 UP 238 LOC731275 309 KIAAO319L. UP 239 ZFP91 310 KLRD1 DOWN W N 240 CTRL 311 TRIM22 UP 241 BCL6 312 C4ORF18 UP 242 SAMD3 313 TSPAN3 DOWN 243 LOC647436 314 TSPAN3 DOWN 244 CLC 315 LOCA28.748 DOWN s S 245 GK 316 DNAJC3 UP 246 LOC10O133565 OWN 317 US 2015/03 15643 A1 Nov. 5, 2015

TABLE 1-continued TABLE 1-continued List of 1446 genes that differentiate between lung List of 1446 genes that differentiate between lung cancer, pneumonia. TB and Sarcodiosis. cancer, pneumonia. TB and Sarcodiosis. Cancer vs Pneumonia vs Sarcoidosis Tb vs SEQ Cancer vs Pneumonia vs Sarcoidosis Tb vs SEQ Symbol Control Control vs Control Control ID NO: Symbol Control Control WS Control Control ID NO:

OAS2 UP DOWN UP UP 3.18 PARP15 DOWN DOWN DOWN DOWN 389 LOC644937 DOWN DOWN DOWN DOWN 319 PAFAH2 DOWN DOWN DOWN DOWN 390 SIRPD UP UP UP UP 320 COL17A1 UP UP UP UP 391 GPBAR1 UP DOWN UP UP 321 LOC651524 UP UP UP UP 392 GNL3 DOWN DOWN DOWN DOWN 322 TYMP UP UP UP UP 393 CD79B DOWN DOWN DOWN DOWN 323 LOC389672 DOWN DOWN DOWN DOWN 394 ELF2 UP UP UP UP 324 ABCB1 DOWN DOWN DOWN DOWN 395 GAA UP UP UP UP 325 LOC644852 DOWN DOWN UP UP 396 CD47 DOWN DOWN DOWN DOWN 326 TARP DOWN DOWN DOWN DOWN 397 NMT2 DOWN DOWN DOWN DOWN 327 SLAMF7 UP UP UP UP 398 MATR3 DOWN DOWN DOWN DOWN 328 FRMD3 UP UP UP UP 399 TMEM107 UP DOWN DOWN DOWN 329 LOC648984 UP UP UP UP 400 GCM1 UP UP UP UP 330 PLAUR UP UP UP UP 4O1 RORA DOWN DOWN DOWN DOWN 331 LOC10O132119 UP UP UP UP 4O2 MGAM UP UP UP UP 332 KLRG1 DOWN DOWN DOWN DOWN 403 LOC10O132491 UP UP UP UP 333 NTS2 DOWN DOWN DOWN DOWN 404 KRT72 DOWN DOWN DOWN DOWN 334 MYC DOWN DOWN DOWN DOWN 40S SEPT4 UP UP UP UP 335 HIST1H4H UP UP UP UP 4O6 ACADVL UP UP UP UP 336 KBTBD8 DOWN DOWN DOWN DOWN 407 ANXA3 UP UP UP UP 337 C9CORF45 DOWN DOWN DOWN DOWN 4.08 MEGF9 UP UP UP UP 338 GBP6 UP UP UP UP 409 MEGF9 UP UP UP UP 339 KIFAP3 DOWN DOWN DOWN DOWN 410 PTPRT UP UP UP UP 340 HSPC159 UP UP UP UP 411 HLA-DRB4 DOWN DOWN UP UP 341 ZNF224 DOWN DOWN DOWN DOWN 412 GHRL DOWN UP UP UP 342 SOCS3 UP UP UP UP 413 ALAS2 DOWN UP UP UP 343 GOLGA8B DOWN DOWN DOWN DOWN 414 FFAR2 UP UP UP UP 344 OLIG1 DOWN DOWN UP DOWN 415 MPZL2 DOWN UP UP UP 345 TNFRSF4 DOWN DOWN UP DOWN 416 PML DOWN UP UP UP 346 LOC10O133583 DOWN DOWN UP UP 417 HLA-DQA1 DOWN DOWN UP UP 347 ARL4A DOWN DOWN DOWN DOWN 418 CEACAM8 UP UP UP UP 348 ASNS DOWN DOWN DOWN DOWN 419 SH3KBP1 DOWN DOWN DOWN DOWN 349 ITGAX UP UP UP UP 420 TRPM2 UP UP UP UP 350 LOC153561 UP UP UP UP 421 CUX1 UP UP UP UP 351 GSTM1 DOWN DOWN DOWN DOWN 422 LOC648390 DOWN DOWN UP DOWN 352 OAS2 DOWN DOWN UP UP 423 SUV39H1 DOWN DOWN DOWN DOWN 353 OAS2 UP UP UP UP 424 RNF13 UP UP UP UP 3S4 TRIM2S UP UP UP UP 425 USF1 UP UP UP UP 355 ABHD14A DOWN DOWN DOWN DOWN 426 WAPA UP UP UP UP 356 LOC642342 UP UP DOWN DOWN 427 ALOX15 DOWN DOWN UP DOWN 357 GPRS6 DOWN DOWN DOWN DOWN 428 CD79A DOWN DOWN DOWN DOWN 358 C4ORF18 UP UP UP UP 429 DPRXP4 UP UP UP UP 359 AK1 DOWN DOWN DOWN DOWN 430 LOC652750 DOWN UP UP UP 360 PIK3R6 DOWN UP UP UP 431 ECM1 UP UP DOWN UP 361 HSPE1 DOWN DOWN DOWN DOWN 432 ST6GAL1 DOWN DOWN DOWN DOWN 362 ASPHD2 DOWN UP UP UP 433 KLHL3 DOWN DOWN DOWN DOWN 363 DHRS9 UP UP UP UP 434 RTP4 DOWN UP UP UP 364 GRN UP UP UP UP 435 FAM179A DOWN DOWN UP DOWN 365 BEND7 UP UP UP UP 436 HDC DOWN DOWN UP DOWN 366 BOAT DOWN DOWN DOWN DOWN 437 SUMO1P1 UP UP DOWN UP 367 LOC728323 UP UP DOWN UP 438 SACS DOWN DOWN DOWN DOWN 368 LOC10O1343OO UP UP UP UP 439 C90RF72 UP UP UP UP 369 SDSL UP UP UP UP 440 C90RF72 UP UP UP UP 370 TNFAIP6 UP UP UP UP 441 LOC652726 DOWN DOWN DOWN DOWN 371 ARHGAP24 UP UP UP UP 442 PVRIG DOWN DOWN DOWN DOWN 372 LOC4O2176 UP UP UP DOWN 443 PPP1R16B DOWN DOWN DOWN DOWN 373 LOC441019 DOWN DOWN UP UP 444 NSUNT UP UP DOWN DOWN 374 FAM134B DOWN DOWN DOWN DOWN 445 NSUNT UP UP DOWN UP 375 ZNF573 DOWN DOWN DOWN DOWN 446 UHRF2 DOWN DOWN DOWN DOWN 376 ZNF783 DOWN DOWN DOWN DOWN 377 LOC441013 DOWN DOWN DOWN DOWN 378 0.052 Distinct biological pathways were found to be asso UP UP UP UP 379 ciated with the pulmonary granulomatous diseases differing LOC100129.343 UP UP UP UP 380 OSM UP UP UP UP 381 from those associated with the acute pulmonary diseases, UNC93B1 UP UP UP UP 382 pneumonias and chronic lung diseases, lung cancers. DNAJC30 DOWN DOWN DOWN DOWN 383 0053. Having established by the derived 1446-transcript FLJ14166 UP UP DOWN DOWN 384 C90RF72 UP UP DOWN UP 385 signature that the pulmonary granulomatous diseases had SAMD4A UP UP UP UP 386 similar transcriptional profiles to each other but different to RNY4 DOWN DOWN DOWN DOWN 387 those of the pneumonia and lung cancer patients we wished to F5 UP UP UP UP 388 determine the main biological pathways associated with the 1446-transcripts in relation to each disease (SEQID NOS.:1 US 2015/03 15643 A1 Nov. 5, 2015 22 to 1,446). The 1446 unsupervised clustering revealed three and lung cancer patients. FIG. 3A shows the 1396 transcripts main clusters of transcripts as can be seen from the vertical and Training Set patients’ profiles are organised by unsuper dendrogram (FIG. 2). Ingenuity Pathway Analysis (IPA) of vised hierarchical clustering. A dotted line is added to the the main clusters of transcripts revealed that the TB and heatmap to clarify the main clusters generated by the cluster sarcoidosis samples were associated with over-abundance of ing algorithm. Transcript intensity values are normalised to the interferon signalling pathway and other immune response the median of all transcripts. FIG. 3B shows the molecular pathways (FIG. 2). However the pneumonia and lung cancer distance to health of the 1396 transcripts in the Training and samples were associated with over-abundance of pathways Test sets demonstrates the quantification of transcriptional linked with inflammation. All four diseases associated with change relative to the controls. The mean and SEM was under-abundance of T and B cell pathways. Using the 1,446 compared between each disease group (ANOVA with genes or probes, the skilled artisan can select Subsets of genes Tukey's multiple comparison test). that will best differentiate between two, three or four pulmo 0057. Unsupervised hierarchical clustering again showed nary diseases by taking advantage of both the level of expres the same clustering pattern as seen with the 1446-transcripts sion but also whether the gene is over- or under-expressed. As (FIG.3A). Applying the clinical classification decision tree it taught herein, certain Subsets are demonstrated to be unique could be seen that those sarcoidosis patients clustering with to certain pulmonary diseases, but can also be used to identify the TB patients had been classified as active and those with if a patient or subject has one, two, three or four of the the healthy controls as non-active. This was further validated pulmonary diseases. in two independent cohorts, the Test and Validation Sets (data 0054 FIG. 2. Three dominant clusters of transcripts in the not shown). In addition, it was found that the applied clinical unsupervised clustering of the 1446 transcripts are associated classification decision tree was able to predict if the sarcoi with distinct Ingenuity Pathway Analysis canonical path dosis patients’ transcriptional profiles clustered with the TB ways. Each of the three dominant clusters of transcripts is patients or the healthy controls better than any routinely mea associated with different study groups in the Training Set. The sured single clinical variable (data not shown). Furthermore top transcript cluster is over-abundant in the pneumonia and the clinical classification decision tree was still Superior in its lung cancer patients and significantly associated with IPA clustering predictive ability even if the single clinical vari pathways relating to inflammation (Fisher's exact p-0.05 ables with the highest predictive values were used in conjunc Benjamini Hochberg). The middle transcript cluster is over tion with each other or even when used together with the abundant in the TB and sarcoidosis patients and significantly clinical classification criteria (data not shown). Molecular associated with interferon signalling and other immune distance to health (MDTH) demonstrates the quantification of response IPA pathways (Fisher's exact p-0.05 Benjamini transcriptional change relative to the controls (FIG. 3B) (2). Hochberg). The bottom transcript cluster is under-abundant By applying this algorithm to all the disease groups for the in all the patients and significantly associated with T and B 1396-transcripts it could be seen that the non-active sarcoi cell IPA pathways (Fisher's exact p<0.05 Benjamini Hoch dosis MDTH score was not significantly different from the berg). controls, however the active sarcoidosis MDTH score was 0055. The sarcoidosis patients heterogeneous transcrip significantly different from the controls. In addition the TB tional profiles were explained by their clinical phenotype. patients MDTH score was significantly higher than active 0056. From the unsupervised clustering of the 1446-tran sarcoidosis patients score. Lung cancer and pneumonia both Scripts it can be seen that the sarcoidosis patients fell into two had significantly higher scores than the controls with pneu groups, those that clustered with the TB patients and those monia significantly higher than cancer. Pneumonia and TB that clustered with the healthy controls (FIG. 1). As the blood had the highest MDTH scores. The significant differences in transcriptional profile is a snap shot view of the hosts the MDTH scores between the patient groups suggest there is immune response we applied the same approach to clinically a quantitative as well as qualitative difference in blood tran phenotyping the patients to understand if their clinical clas Scriptional signatures between these similar pulmonary dis sification correlates with their transcriptional profile. How CaSCS. ever there is no consensus on how to reliably assess disease 0058. Three different data mining strategies showed the activity and current classification systems all require continu same findings that both TB and active sarcoidosis were domi ous follow-up of the patient over a prolonged period of time nated by IFN-inducible genes, in contrast to pneumonia and before their activity status can be stated (1). Therefore a lung cancer, which were dominated by inflammatory genes. clinical classification was devised decision tree based on 0059. To further understand the biological pathways asso clinical variables that are both routinely measure in sarcoido ciated with each disease group we undertook three different sis patients and have been shown to be associated with disease data mining strategies to ensure our findings were robust and activity (data not shown). Using exactly the same analysis consistent. The three approaches applied were: modular strategy as for the 1446-transcripts, but this time with the analysis, Ingenuity Pathway Analysis and annotation of the sarcoidosis patients classified as either active or non-active, top differentially expressed genes for each disease group. 1396-transcripts were found to be differentially expressed 0060. To carry out modular analysis all detectable genes across all the disease groups. FIGS. 3A and 3B shows the (15.212 transcripts) in the whole Training set dataset were results from the sarcoidosis patients clinically classified as analysed. Each module corresponds to a set of co-regulated active sarcoidosis display similar transcriptional signatures to genes that were assigned biological functions by unbiased the TB patients but are very distinct from the transcriptional literature profiling (3). FIGS. 4A to 4E shows modular analy signatures of the clinically classified non-active sarcoidosis sis of the Training Set shows the similarity of the biological patients which in turn resemble the healthy controls. 1396 pathways associated with TB and sarcoidosis (particularly transcripts are differentially expressed in the whole blood of overexpression of the IFN modules), differing from pneumo healthy controls, pulmonary TB patients, active sarcoidosis nia and lung cancer (particularly overexpression of the patients, non-active sarcoidosis patients, pneumonia patients inflammation modules). FIG. 4A shows gene expression lev US 2015/03 15643 A1 Nov. 5, 2015 els of all transcripts that were significantly detected compared from the mean of the controls, Mann Whitney Benjamini to background hybridisation (15.212 transcripts, p<0.01) Hochberg p-0.01). FIG. 5A shows the IPA canonical path were compared in the Training Set between each patient ways was used to determined the most significant pathways group: TB, active sarcoidosis, non-active sarcoidosis, pneu (i-iv) associated with each disease relative to the other dis monia, lung cancer, to the healthy controls. Each module eases (Fisher's exact Benjamini Hochberg). The bottom corresponds to a set of co-regulated genes that were assigned X-axis and bars of each graph indicates the log(p-value) and biological functions by unbiased literature profiling. A red dot the top X-axis and line indicates the percentage of genes indicates significant over-abundance of transcripts and a blue present in the pathway. The genes in the EIF2 signalling pathway are predominately under-abundant genes however dot indicates significant under-abundance (p<0.05). The the genes in the other three pathways are predominantly over colour intensity correlates with the percentage of genes in that abundant relative to the controls. Pathways above the blue module that are significantly differentially expressed. The dotted line are significant (p<0.05). FIGS. 5B, 5C and 5D modular analysis can also be represented in graphical form as show the interferon signalling IPA pathway is overlaid onto shown in 4B-E, including both the Training and Test Set each disease group. Coloured genes are differentially samples. FIG. 4B shows the percentage of genes significantly expressed in that disease group compared to their matched overexpressed in the 3 IFN modules for each disease. FIG. 4C shows the fold change of the expression of the genes present controls (Fisher's exact p-0.05). Red genes represent over in the IFN modules compared to the controls. FIG. 4D shows abundance and green under-abundance. the percentage of genes significantly overexpressed in the 5 0061 The Comparison IPA reveals the most significant inflammation modules for each disease. FIG. 4E shows the pathways when comparing across the diseases. The top four fold change of the expression of the genes present in the significant pathways were related to protein synthesis (EIF2 inflammation modules compared to the controls. TB and signalling) and immune response pathways (interferon sig active sarcoidosis show significant overexpression of the IFN nalling, role of pattern recognition receptors in recognition of modules compared to the other pulmonary disease groups bacteria and viruses and antigen presentation pathway)(FIG. (FIG. 4A). In contrast the pneumonia and cancer patients 5A). The prominence of the EIF2 signalling pathway was showed significant overexpression of the inflammation mod driven by the pneumonia patients. The genes were signifi ules compared to TB and active sarcoidosis. These findings cantly under-abundant in the pneumonia patients compared to were then verified by modular analysis of the Test Set (Figure the other pulmonary diseases. Many other genes related to E7). The modular analysis therefore also substantiates our protein synthesis (including eukaryotic initiation factors and results determined from pathways linked to the 1446-tran ribosomal proteins) and the unfolded protein response (a scripts signature described earlier (FIG. 2). TB patients stress response to excessive protein synthesis), were also showed a significant increase in the number of IFN genes significantly under-abundant in the pneumonia patients com (FIG. 4B), and their degree of expression (FIG. 4C), com pared to the other pulmonary diseases, e.g. PERK, CHOP. pared to the active sarcoidosis patients, demonstrating a ABCE 1 (data not shown). The significance of the three quantitative difference in the IFN-inducible signature immune response pathways was driven predominantly by the between TB and active sarcoidosis (FIG. 4B-C) The same TB patients, but also by the sarcoidosis patients. The path genes in the IFN module that were overexpressed in the active ways were more significant (bottom X-axis bar graph in FIG. sarcoidosis patients were also overexpressed in the TB 5A) and contained a higher number of genes (top X-axis line patients (data not shown). Pneumonia and lung cancer graph in FIG. 5A) in both TB and active sarcoidosis than showed a significant increase in the number of genes present compared to the other pulmonary diseases, again demonstrat in the inflammation modules (FIG. 4D), and their degree of ing the similarity of the biological pathways underlying these expression (FIG. 4E), in comparison to TB and active sarcoi pulmonary granulomatous diseases. However the interferon dosis (FIG. 4A, D-E). Pneumonia patients also showed a signalling pathway was more significant (bottom X-axis bar significant overexpression of the number of genes present in graph FIG.5A) and contained a higher number of genes in the the neutrophil module compared to all the other pulmonary TB than the active sarcoidosis patients and were not repre diseases (Figure E8). Whole blood gene expression may cor sented in pneumonia and lung cancer (top X-axis line graph relate with the bloods cell composition or with the gene FIG.5A, FIG. 5B and FIG.5C). expression in particular cellular populations. For the neutro 0062. The third data mining strategy just examined the top phil genes there was a significant correlation between the 50 over-abundant differentially expressed transcripts for each neutrophil module and the neutrophil count for all the pneu disease. It could be seen that the transcripts correlate well monia patients versus controls (Pearson's correlation, p<0. with the findings from the modular and IPA analysis as both 0001). The second data mining approach, comparison IPA, the TB and active sarcoidosis top 50 over-abundant tran only used those genes that were differentially expressed scripts were dominated by IFN-inducible genes e.g. IFITM3 between each disease group and a set of controls matched by (SEQ ID NO.:989), IFIT3 (SEQ ID NO.1279), GBP1 (SEQ ethnicity and gender (>1.5 fold change from the mean of the ID NO.:226), GBP6 (SEQID NO.: 1409), CXCL10 (SEQID controls, Mann Whitney Benjamini Hochberg p<0.01; NO.1298), OAS1 (SEQID NO.:790), STAT1 (SEQID NO.: TB-2524, active sarcoidosis=1391, pneumonia=2801 and 995), IFI44L (SEQ ID NO.: 1013), FCGR1B (SEQ ID NO.: lung cancer—1626 differentially expressed transcripts). FIG. 63) (Table 6). However the expression fold change was much 5A shows a comparison Ingenuity Pathway Analysis of the higher in the TB patients than the active sarcoidosis patients. four disease groups compared to their matched controls In addition the pneumonia top 50 over-abundant transcripts reveals the four most significant pathways. Differentially were dominated by antimicrobial neutrophil-related genes expressed genes were derived from the Training Set by com e.g., ELANE (SEQ ID NO.:330), DEFA1B (SEQ ID NO.: paring each disease to healthy controls matched for ethnicity 1024), MMP8 (SEQID NO.:521), CAMP (SEQID NO.:40), and gender: TB-2524, active sarcoidosis=1391, pneumo DEFA3 (SEQ ID NO.1088), DEFA4 (SEQ ID NO.:231), nia=2801 and lung cancer—1626 transcripts (>1.5 fold change MPO (SEQ ID NO.1287), LTF (SEQ ID NO.:506). The US 2015/03 15643 A1 Nov. 5, 2015 24 genes FCGR1A, B and C (SEQID NO.1109,63, 50, respec such that there was no significant difference between the tively)) were over-abundant in the top 50 transcripts of all four pneumonia post-treatment transcriptional profiles and the pulmonary diseases. A 4-set Venn diagram of the differen healthy controls (FIGS. 6A & B). We have previously studied tially expressed genes was able to demonstrate the unique the blood transcriptional response of a cohort of active TB genes for each disease group (FIG.9 and Table 7). There were patients from South Africa before and after successful anti over three times the number of unique TB genes than unique TB treatment (4). Therefore we used the same 1446-tran active sarcoidosis genes of which only the TB unique genes scripts that were derived from this present study to assess the were significantly associated with the IPA IFN-signalling transcriptional response of these South African TB patients pathway. The unique pneumonia genes were associated with before and after treatment, compared to their latent TB con an under-abundance of pathways related to protein synthesis. trols. The MDTH score of the untreated active TB patients The unique lung cancer genes were associated with over were significantly different from the latent TB controls how abundance of inflammation related pathways. The overlap ever the transcriptional response after treatment again ping genes common to all four disease groups were signifi reversed with no significant difference between the treated cantly associated with under-abundance of T and B cell active TB patients and the latent TB controls (FIG. 6C). pathways. 0066. The treated sarcoidosis patients showed a variable 0063 TB and pneumonia patients after treatment showed clinical response after immunosuppressive treatment initia a diminishment of their transcriptional profiles to resemble tion as determined by their practising physician (clinical data the controls however the sarcoidosis patients who respond to not shown but available). If the physician increased their glucocorticoids showed a significant increase in their tran treatment at their clinic follow-up the patient was categorised Scriptional activity. as having an inadequate treatment response but if the phy 0064 FIGS. 6A to 6D show both modular analysis and sician continued the same treatment or reduced their treat molecular distance to health reveal that the blood transcrip ment this was categorised as having a good treatment tome of the pneumonia and TB patients after successfully response. Applying the same two data mining strategies as completing treatment are no different from the healthy con used for the pneumonia patients it could clearly be seen that trols however the sarcoidosis patients show an overexpres the sarcoidosis patients who had a good clinical response to sion of inflammation genes during a clinically Successful glucocorticoids had a significant overexpression of inflam response to glucocorticoids. FIG. 6A shows a modular analy matory genes that was not seen when the same or the different sis for gene expression levels of all transcripts that were sarcoidosis patients had an inadequate response to immuno significantly detected compared to background hybridisation suppressive treatment (FIGS. 6A & D). The majority of the (p<0.01) were compared between the healthy controls and inflammatory genes that were overexpressed in the untreated each of the following the patient groups: pre-treatment pneu pneumonia and lung cancer patients were also overexpressed monia, post-treatment pneumonia patients and pre-treatment in the good-treatment response sarcoidosis patients (Table 8), sarcoidosis, inadequate treatment response sarcoidosis and but many more transcripts were overexpressed in the good good treatment response sarcoidosis patients. A red dot indi treatment response sarcoidosis patients (clinical data not cates significant over-abundance of transcripts and a blue dot shown but available). The term inflammation comprises indicates under-abundance (p<0.05). The colour intensity many forms and therefore there is a diversity of genes that are correlates with the percentage of genes in that module that are called inflammatory. Interestingly many of the top 50 over significantly differentially expressed. MDTH demonstrates expressed inflammatory genes in the good-treatment the quantification of transcriptional change after treatment in response sarcoidosis patients are known to be anti-inflamma the 1446-transcripts relative to controls for pre-treatment tory genes which are invariably induced alongside proinflam pneumonia, post-treatment pneumonia patients, pre-treat matory genes in what is termed an inflammatory response, ment TB and post-treatment TB and and pre-treatment sar e.g., IL1R2 (SEQ ID NO.1007), DUSP1, IL18R (SEQ ID coidosis, inadequate treatment response sarcoidosis and good NO.:239), C-FOS, IKBO. and MAPK1, as well as pro-inflam treatment response sarcoidosis patients. The mean and SEM matory genes (Table 8). was compared between each disease group (ANOVA with 0067. The interferon-inducible genes were most abundant Tukey's multiple comparison test). FIG. 6B, Pneumonia in the neutrophils in both TB and sarcoidosis. It was previ patients: FIG. 6C, TB patients from the Bloom et al., 2012 ously shown in the Berry, et al., 2010 publication (5) that the (12), study carried out in South Africa, the controls in this active TB signature was dominated by a neutrophil-driven study were participants with latent TB; FIG. 6D Sarcoidosis IFN-inducible gene profile, consisting of both IFN-Y and type patients. IIFN-O?3 signalling (5). Therefore the inventors identified the 0065. More specifically, having determined the blood main cell populations driving the IFN-inducible signature in transcriptional signatures of untreated patients with the pull the active sarcoidosis patients. A new cohort of patients (TB monary granulomatous diseases TB and sarcoidosis and the and active sarcoidosis) were recruited and controls to test the infectious disease community and acute lung diseases of same IFN-inducible genes as used in the Berry, et al., 2010 acquired pneumonia we next sought to examine their tran publication (5) in the purified leucocyte populations of TB Scriptional response to treatment. The pneumonia patients and sarcoidosis patients who had an IFN-inducible signature were all followed-up at least 6 weeks after their hospital present in whole blood (Table 9). discharge and showed a good clinical response to their treat 0068 FIGS. 7A to 7E show that interferon-inducible gene ment with standard antibiotics (clinical data not shown but expression is most abundant in the neutrophils in both TB and available). Using two completely different data mining strat sarcoidosis. The expression of interferon-inducible genes egies, modular analysis (all detectable transcripts were analy was measured in purified leucocyte populations from whole sed) and MDTH (only the 1446-transcripts were analysed), it blood. FIG. 7A is a heatmap that shows the expression of could be seen that the pneumonia patients after Successful IFN-inducible transcripts, from the Berry, et al., 2010 study treatment showed a reversal of their transcriptional profiles (5), for each disease group normalised to the controls for that US 2015/03 15643 A1 Nov. 5, 2015 cell type. FIG.7B shows the expression fold change in the TB TB profiles compared to the active sarcoidosis profiles (Table samples of the same IFN-inducible transcripts. FIG. 7C 2). Two recent publications also described gene lists that shows the expression fold change in the sarcoidosis samples could distinguish TB from all sarcoidosis patients (7, 8). of the same IFN-inducible transcripts. FIG. 7D shows the These previously published gene lists were derived from dif expression fold change in the TB samples of all the genes ferent cohorts and used different microarray platforms. We present in the three interferon modules compared to the con used a class prediction machine learned algorithm, Support vector machines (SVM), to test our gene list and the two trols. FIG. 7E shows the expression fold change in the sar previously published gene lists for their ability to predict coidosis samples of all the genes present in the three inter whether a transcriptional profile belonged to a TB patient or feron modules compared to the controls. not. The prediction model is built using the transcriptional 0069. Again the neutrophils displayed the highest relative signature from samples with known disease-types to predict abundance of IFN-inducible genes in active TB (FIGS. 7A, the classification of a new collection of samples. The SVM 7B & 7D). The neutrophils also had the highest abundance of model should therefore be built in one study cohort and run in IFN-inducible genes in the sarcoidosis patients, although to a an independent cohort to prevent over-fitting the predictive lesser extent than was seen in the TB patients (FIGS. 7A, 7C signature. This was possible for all our cohorts. Where the & 7E). The monocytes showed a higher abundance of IFN study cohorts used a different microarray platform the SVM inducible genes than the lymphocytes in both the TB and model had to be re-built in that cohort. However to reduce the sarcoidosis patients (FIG. 7A-E), as previously shown (5). effects of over-fitting the same parameters were used every 0070 FIG. 8 shows the results for each of the pulmonary time the SVM model was built. diseases using the genes expressed in a neutrophil module. FIG. 8A shows the percentage of genes significantly overex TABLE 2 pressed in the neutrophil module for each disease in both the 144 transcripts. The 144 transcripts are differentially Training and Test set. FIG. 8B shows the fold change of the expressed genes between the TB and active sarcoidosis expression of the genes present in the neutrophil module profiles in the Training Set compared to the controls. significance analysis of microarray Cls 0.05, fold change e 1.5). 0071 FIG. 9 is a 4-set Venn diagram comparing the dif Fold Change ferentially expressed genes for each disease group compared TB vs Active to their ethnicity and gender matched controls. Differentially Symbol Sarcold Regulation expressed genes were derived from the Training Set by com C1QB 10.6 UP paring each disease to healthy controls matched for ethnicity LOC10O13356S 6.4 UP and gender: TB-2524, active sarcoidosis=1391, pneumo TDRD9 5.3 UP nia=2801 and lung cancer—1626 transcripts (>1.5 fold change ABCA2 5.3 UP from the mean of the controls, Mann Whitney Benjamini SMARCD3 5.3 UP CACNA1E S.1 UP Hochberg p-0.01). The 4-set Venn diagram was created using HP 4.2 UP Venny (13). IPA canonical pathways was used to determined NTN3 4.2 UP the most significant pathways associated with the unique LOC10OOO8589 3.3 UP transcripts for each disease (Fisher's exact p-0.05). Active CARD17 3.3 UP LOC441763 3.2 UP Sarc=active sarcoidosis. ERLIN1 3.1 UP 0072 FIG. 10A is a Venn diagram comparing the gene lists SLPI 3.1 UP used in the class prediction. The gene lists were obtained from SLC26AB 2.9 UP AIM2 2.8 UP this study (144 Illumina probes), Maertzdorf, et al., study (8) NCA 2.8 UP (100 Agilent probes of which only 76 probes were recognised OPLAH 2.7 UP as genes using NIH DAVID Gene ID Conversion Tool) and LPCAT2 2.6 UP Koth, et al., study (7) (50 genes obtained from a Affymetrix SEPT4 2.5 UP DISC1 2.5 UP platform although analysis also included data obtained from 2FP91 2.5 UP alternative studies from GEO databases which used other UBE22 2.4 UP microarray platforms the majority from the Berry et al., 2010 KREMEN1 2.4 UP (5) by current applicants). In the Illumina platform used to ALPL 2.3 UP LOC10OOO8589 2.3 UP compare these lists some genes are represented by more than KCN816 2.2 UP one transcript for example the 50 genes in Koth etal study (7) C19Crf59 2.2 UP translate to 77 Illumina probes/transcripts. FCGR1A 2.2 UP 0073 144-transcripts were able to distinguish with good SPATA13 2.2 UP ADM 2.2 UP sensitivity and specificity the TB patients from the other CDKSRAP2 2.2 UP pulmonary diseases and healthy controls. SNORA73B 2.2 UP 0074 Although the transcriptional profiles of the TB and TncRNA 2.1 UP PPAP2C 2.1 UP active sarcoidosis patients appeared very similar we wished IFITM3 2.1 UP to determine if a gene list could distinguish the TB samples, FCGR1B 2.1 UP from all the other patient and control samples. Therefore we JMJD6 2.1 UP compared the TB transcriptional profiles to the most similar HIST2H3D 2.1 UP LMNB1 2.0 UP group, active sarcoidosis, to derive a set of differentially S100A12 2.0 UP expressed genes. 144 transcripts were differentially FCGR1C 2.0 UP expressed between the TB and active sarcoidosis transcrip LOC653591 2.0 UP tional profiles from the Training Set (significance analysis of LOC10O132394 2.0 UP microarray q<0.05, fold changes 1.5). Many of the transcripts SLC26A8 2.0 UP were IFN-inducible genes and were all over-abundant in the ANXA3 2.0 UP US 2015/03 15643 A1 Nov. 5, 2015 26

TABLE 2-continued TABLE 2-continued 144 transcripts. The 144 transcripts are differentially 144 transcripts. The 144 transcripts are differentially expressed genes between the TB and active sarcoidosis expressed genes between the TB and active sarcoidosis profiles in the Training Set profiles in the Training Set significance analysis of microarray Cls 0.05, fold change e 1.5). significance analysis of microarray Cls 0.05, fold change e 1.5). Fold Change Fold Change TB WS Active TB vs Active Symbol Sarcold Regulation Symbol Sarcold Regulation

NLRC4 PROK2 1.6 UP LOC10O134364 IFI30 1.6 UP LILRA6 FCER1G 1.5 UP LOC653610 2NF438 1.5 UP CST7 EEF1D 1.5 UP LILRB4 MIR21 1.5 UP MSL3L1 NGFRAP1 1.5 UP HIST2H2BG PGS1 1.5 UP OSM KIF1B 1.5 UP LILRAS C16orf57 1.5 UP GPR97 ANKRD33 1.5 UP HIST2H2AC MXD4 -1.5 DOWN LILRAS 2SCAN18 -1.6 DOWN TLRS MEF2D -1.6 DOWN BHLHB2 -1.7 DOWN CLC -2.3 DOWN FCER1A -2.5 DOWN SRGAP3 -2.6 DOWN FLI43093 -2.8 DOWN CCR3 -2.9 DOWN EMR4 -3.0 DOWN ZNf792 -3.1 DOWN C10orf.3 -3.5 DOWN CACNG6 -3.8 DOWN P2RY1O -4.2 DOWN GATA2 -4.6 DOWN EMR4P -6.6 DOWN ESPN -7.0 DOWN EMR4 -9.3 DOWN

0075. The 144 Illumina transcripts showed good sensitiv ity (above 80%) and specificity (above 90%) in all three independent cohorts from our study (Training, Test and Vali dation Sets) and when using an external cohort from the HIST2H2AA4 Maertzdorfetal study. The 100 Agilent transcripts from the LOCA285.19 Maertzdorfetal 2012 study were also tested (7). Only 76 of SMARCD3 these transcripts were recognised as genes by NIH DAVID LOC641710 HIST2H2BE Gene ID Conversion Tool. The same SVM parameters as used TPRIPL2 earlier were then applied using the Maertzdorf et al tran FXBP5 Scripts in our three independent cohorts (Training, Test and FNAR1 Validation Sets). The sensitivity however was much lower LY96 (45-56%), with similar specificity (above 90%). The 50 genes LOCA28417 from the Koth et al 2011 (7) study run using an Affymetrix DHRS13 platform were also tested. The same SVM parameters were again applied to all our independent cohorts (Training, Test and Validation Sets). The sensitivity of this gene list was also lower (75-45%), with similar specificity (above 87%), than for our 144-transcripts. Neither the Koth et al 2011 (7) or the Maertzdorf et al 2012 (8) studies reported testing their derived gene lists in independent cohorts. As these study SLC22A4 tested the 144-transcripts list from the present applicants LOC6451.59 L4R (Bloom, O'Garra et al., to be submitted), in both internal and FLI32255 external independent cohorts this is likely to have improved HIST2H2AA3 the validity of the transcript list as a discriminative marker, PLAC8 and may explain why there was little overlap between their SH3GLB1 gene lists or overlap with the present applicants 144 gene list PLSCR1 (Figure E10). Tables 3, 4 and 5. Class prediction. Class pre diction was performed using Support vector machines (SVM). 0076 Table 2 (above) shows the 144 transcripts derived B4GALTS C from the Training Set which were then used to build the SVM COP1 model, the model was then run in the other four cohorts Table 3 (just below). US 2015/03 15643 A1 Nov. 5, 2015 27

(0078 Table 5 (below) shows the 50 genes from the Kothet The 144 transcripts derived from the Training Set in this present al study (7) were used to build the last SVM model in the study, Bloon et al (Illumina), were tested in the cohorts below: Training Set and run in the Test and Validation Sets. N/A not Present study Present study applicable. Training Set Test Set Maertzdorf (controls, TB, (controls, TB, Present study etal sarcoid, sarcoid, Validation Set controls, cancer, cancer, (controls, TB, TB, The 50 genes from the Koth et al study (Affymetrix) were pneumonia) pneumonia) sarcoid) (sarcoid) tested in the cohorts below: Sensitivity 88% 82% 88% 88% Specificity 94% 91% 92% 97% Present study Present study Koth etal Training Set Test Set Present study (sarcoid and 0077 Table 4 (below). The 100 Agilent transcripts from (controls, TB, (controls, TB, Validation Set all cohorts the Maertzdorf et al study (8) translated to 76 recognised sarcoid, cancer, sarcoid, cancer, (controls, TB, from Berry genes using the DAVID gene converter. The SVM model was pneumonia) pneumonia) sarcoid) etal study) built in the Training Set and run in the Test and Validation Sets. Sensitivity 75% 45% SO% Not shown in their publication The 76 recognised genes out of the 100 probes from the Maertzdorf M 0. 0. 0. etal study (Agilent) were tested in the cohorts below: Specificity 92% 87% 92% Not shown in their Present study Present study Maertzdorf ublication Training Set Test Set Present study etal p (controls, TB, (controls, TB, Validation Set (controls, sarcoid, cancer, sarcoid, cancer, (controls, TB, TB, pneumonia) pneumonia) sarcoid) sarcoid) (0079 Table 6 (below). The top 50 differentially expressed Sensitivity 56% 45% 75% 88% transcripts for each disease compared to matched controls (as stated from the ppresent applicantspp studv).y Differentiallyy expexpressed publication)in their genes were derived from the Training9. Setbvy comparingp 9. each Specificity 96% 92% 92% 97% disease to healthy controls matched for ethnicity and gender: (as al TB-2524, active sarcoidosis=1391, pneumonia=2801 and In their lung cancer=1626 transcripts (>1.5 fold change from the publication) mean of the controls, Mann Whitney Benjamini- - - Hochberg p<0.01).

TB Active sarcoidosis Pneumonia Lung cancer

Fold Change Gene Symbol Fold Change Symbol Fold Change Gene Symbol Fold Change Gene Symbol

21 ANKRD22 8.1 FCGR1A 15.8 OLFM4 6.1 ARG1 18.5 FCGR1A 7.9 ANKRD22 12.7 LTF 5.5 TPST1 17.4 SERPING1 7.4 FCGR1C 12.6 VNN1 5.4 FCGR1A 15.1 BATF2 7.1 FCGR1B 12.4 HP 5.2 C19Crf59 14.9 FCGR1C 6.4 SERPING1 12.3 DEFA4 4.6 SLPI 13.7 FCGR1B 6.2 FCGR1B 11.3 OPLAH 4.5 FCGR1B 13.3 ANKRD22 6 BATF2 11.2 CEACAM8 4.3 IL1R1 13.1 FCGR1B 5.5 GBP5 11 DEFA1B 4.1 FCGR1C 10.8 LOCA28744 5.3 GBP 10.1 ELANE 4.1 TDRD9 10 IFITM3 S.1 IFIT3 9.4 C19Crf59 4.1 SLC26A8 9.5 EPSTI1 5 ANKRD22 9.2 ARGI 4.1 FCGR1B 8.7 GBPS 4.9 LOC728744 8.7 CDK5RAP2 4.1 CLEC4D 8.7 IFI44L 4.8 GBP 8.6 DEFA1B 4 LOC10O132858 8.4 GBP6 4.8 EPSTI1 8.4 DEFA3 3.9 SLC22A4 8.1 GBP 4.6 IFI44L 8.3 DEFA1B 3.8 LOC10O133177 7.8 LOC4OO759 4.6 INDO 8.1 FCGR1A 3.7 SIPA1L2 7.7 IFIT3 4 IFITM3 7.9 MMP8 3.6 ANXA3 7.6 AIM2 4 GBP6 7.4 FCGR1B 3.6 LIMK2 7.3 SEPT4 4 RSAD2 7.3 SLPI 3.5 TMEM88 7.1 C1QB 3.9 DHRS9 7.2 SLC26A8 3.5 MMP9 6.9 GBP1 3.7 TNFAIP6 7.1 MAPK14 3.5 ASPRV1 6.9 RSAD2 3.7 IFIT3 7.1 CAMP 3.5 MANSC1 6.4 RTP4 3.5 P2RY14 6.7 NLRC4 3.5 TLRS 6.1 CARD17 3.4 DHRS9 6.4 FCAR 3.5 CD163 5.9 IFIT3 3.4 IDO1 6.3 RNASE3 3.4 CAMP

US 2015/03 15643 A1 Nov. 5, 2015 30

TABLE 9-continued TABLE 10-continued

Interferon inducible genes from Berry, et al. (5). List of Genes Downregulated in Tb versus Symbol Active Sarcoid SOS, Fold change STAT1 Symbol TB vs Active Sarcoid Regulation STAT2 TAP1 CLC -2.3 DOWN TAP2 FCER1A -2.5 DOWN SRGAP3 -2.6 DOWN 0082 FIG. 10B is a Venn diagram comparing the genes FLJ43093 -2.8 DOWN that distinguish between Tb, sarcoidosis, pneumonia and lung CCR3 -2.9 DOWN cancer, Versus, Tb, active sarcoidosis, non-active sarcoidosis, EMR4 -3 DOWN pneumonia and lung cancer. The overlapping 1359 genes are ZNF792 -3.1 DOWN included in the attached electronic table. C10orf.3 -3.5 DOWN CACNG6 -3.8 DOWN TABLE 10 P2RY1O -4.2 DOWN GATA2 -4.6 DOWN List of Genes Downregulated in Tb versus EMR4P -6.6 DOWN Active Sarcoid ESPN -7 DOWN Fold change EMR4 -9.3 DOWN Symbol TB vs Active Sarcoid Regulation MXD4 -1.5 DOWN MEF2D -1.6 DOWN ZSCAN18 -1.6 DOWN BHLHB2 -1.7 DOWN

TABLE 11 List of 87 genes of FIG. I.OB. Probe) Probe Sequence Symbol

346 O168 GCTGCTTTTAGGTTAACCACAAAGGAACAACTCAGGATCAGTCGTGATTG PHF2 OL1

618 O497 TACTGAAAGACTTTTGCCTAAAGTGGCATTATTGACTGCTGGTGTGATGA LOC4 OO3O4.

64 OO148 GAATACTTCTCTTGCTGAGAGCCGATGCCCGTCCCCGGGCCAGCAGGGAT SELM

185 OO41 TCAGACTCCCTGCCACC TTTTCCCCTGGGTTCTCCCGTCTTGCCT CACTT DPM2

269 O561 CATGGGCTTTGGTCTTTTTGACTAAACCTCTTTTATAACATGTTCAATAA RPLP1

14 OO747 GCGGAAGAGGAGCCGCTGGAACCAAGACACAATGGAACAGAAGACAGTGA SF1

765 O451 AGTGTCCTCGACATCCCAGGGGAAAGCAAGAGCAGTGAGCCTGAGCAGTG ZNF683

385, O632 GAGCCGCCAGGAACCCTCCTCCTGTCAATGGGGGTGTAGTATTTTTGCCA CTTN

488 O6 OO CCCCTTGAGAATGGTGATCCACCCAGTTACAGGGGCATTTAGGGAGCAGA PTCRA

178 OOO8 GCAAGAAAGTCTAACCTATTCCGGTGTTCTCTCTCCCATGAGACAAGCCG SNORA28

74 OO475 TGTTAGCCCTGAAGATCTGGCTACCCCAATAGGAAGGCTGAAGGTTTCCC RPGRIP1

751. O367 TGCCCCCTGACTGATAGCATTTCAGAATGTGTCTTTTGAAGGGCTATACC GPR16O

185 OO35 CAGAGGCAGGAAAAGCAAGGAGCCAGAATTAAGAGGTTGGGTCAGTCTGC PPIA

404 O546 AGGACGTGATCCTGCTTGGGGACTTCAATGCTGACTGCGCTTCACTGACC DNASE1L1

GCTGATCTGGCAGGATGCTCTCTTCAAGCATATCCAAAACCAGATGTGCC HEMGN

439 O487 GAGCAGGGGAGAAATAGCAGAGGGGCTTGGAGGGTCACATAGGTAGATGG RAB13

232 OO47 ACATGGCCCGCAAGGACAATGAATCCACTCACATTGCAGAACAATTCCGA NFIA

26 OO187 GTGAGCCCAAAGTTCTGAAAGGTGTTGCGGCTCCTTCGCCTTCGTCAAAT OC7288.43

SO90630 CCCTGCCCTCATGTTGCTTTGGGTCTAGTGGAGGAGAGAGACAGATAAGC

76.1 OfSO CTCCTGCCACCCAGTGGCCTCTTTAGGCCAAGCTCATGCCTCACAAGGGC LOC10O13 4660

US 2015/03 15643 A1 Nov. 5, 2015 34

0083. Thus, in certain embodiments, the present invention extraction. RNA was isolated using 1.5 ml whole blood and includes the identification and/or differentiation of pulmo the MagMAX-96 Blood RNAIsolation Kit (Applied Biosys nary diseases using the genes in the Tables of the present tems/Ambion) according to the manufacturers instructions. invention. Specifically, the skilled artisan will be able to dif 250 ug of isolated total RNA was globin reduced using the ferentiate the pulmonary diseases using 3, 4, 5, 6, 7, 8, 9, 10. GLOBINclear 96-well format kit (Applied Biosystems/Am 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, bion) according to the manufacturers instructions. Total and 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 144, 150, globin-reduced RNA integrity was assessed using an Agilent 200, 250, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,100, 2100 Bioanalyzer (Agilent Technologies). RNA yield was 1,200, 1,300, 1,400, or even 1,446 genes listed in the tables assessed using a NanoDrop8000 spectrophotometer (Nano contained herein and filed herewith (genes, probes, and SEQ Drop Products. Thermo Fisher Scientific). Biotinylated, ID NOs incorporated herein by reference). The genes may be amplified antisense complementary RNA (cRNA) targets selected based on ease of use or accessibility, based on the were then prepared from 200-250 ng of the globin-reduced genes that are most predictive (e.g., using the tables of the RNA using the Illumina CustomPrep RNA amplification kit present invention), and/or based, in order of importance from (Applied Biosystems/Ambion). 750 ng of labelled cRNA was top to bottom, of the lists provided for use in the analysis. hybridized overnight to Illumina Human HT-12V4 BeadChip 0084 Study population and inclusion criteria. The major arrays (Illumina), which contained more than 47,000 probes. ity of the TB patients were recruited from Royal Free Hospi The arrays were washed, blocked, stained and scanned on an tal, NHS Health Care Trust, London. The sarcoidosis patients Illumina iScan, as per manufacturers instructions. Genom were recruited from Royal Free Hospital, John Radcliffe Hos eStudio (Illumina) was then used to perform quality control pital in Oxford, St Mary’s Hospital, Imperial College NHS and generate signal intensity values. Health Care Trust, and Barnet Hospital in London and the I0088 Cell purification and RNA processing for microar Avicenne Hospital in Paris. The pneumonia patients were ray. Whole blood was collected in sodium heparin. Peripheral recruited from Royal Free Hospital, NHS Health Care Trust, blood mononuclear cells (PBMCs) were separated from the London. The lung cancer patients and 5 of the TB patients in granulocytes/erythrocytes using a LymphoprepTM (Axis the Test Set were recruited by the Lyon Collaborative Net Shield) density gradient. Monocytes (CD14+), CD4+ T cells work, France. All patients were recruited consecutively over (CD4+) and CD8+ T cells (CD8+) were isolated sequentially time such that the Training Set was recruited first followed by from the PBMCs using magnetic antibody-coupled (MACS) the Test Set, Validation Set and lastly the patients’ samples whole blood beads (Miltenyi Biotec, Germany) according to that were used in the cell purification. Additional blood gene manufacturers instructions. Neutrophils were isolated from expression data were obtained from pulmonary and latent TB the granulocyte/erythrocyte layer after red blood cell lysis patients recruited and analysed in our earlier study, and addi using the CD15+MACS beads (Miltenyi Biotec, Germany). tionally reanalysed in the current study, as presented in FIG. RNA was extracted from whole blood (5' Prime PerfectPure 6C (11). Kit) or separated cell populations (Qiagen RNeasy Mini Kit). 0085. The inclusion criteria were specific for each disease. Total RNA integrity and yield was assessed as described Pulmonary TB patients: culture confirmed Mycobacterium above. Biotinylated, amplified antisense complementary tuberculosis in either sputum or bronchoalveolar lavage; pull RNA (cRNA) targets were then prepared from 50 ng of total monary sarcoidosis: diagnosis made by a sarcoidosis special RNA using the NuGEN WTOvationTM RNA Amplification ist, granulomas on biopsy, compatible clinical and radiologi and Encore BiotinIL Module (NuGEN Technologies, Inc). cal findings (within 6 months of recruitment) according to the Amplifed RNA was purified using the Qiagen MinElute PCR WASOG guidelines (9); community acquired pneumonia purification kit (Qiagen, Germany). cRNA was then handled patients: fulfilled the British Thoracic Society guidelines for as described above. diagnosis (10); lung cancer patients: diagnosis by a lung I0089 Raw data processing. After microarray raw data cancer specialist, histological and radiological features con were processed using GeneSpring GX version 11.5 (Agilent sistent with primary lung cancer, healthy controls: their gen Technologies) and the following was applied to all analyses. der, ethnicity and age were similar to the patients, negative After background subtraction each probe was attributed a flag QuantiFERON-TB Gold In-Tube (QFT) (Cellestis) test. The to denote its signal intensity detection p-value. Flags were exclusion criteria for all patients and healthy controls used to filter out probe sets that did not result in a present call included significant other medical history (including any in at least 10% of the samples, where the present lower cut immunosuppression Such as HIV infection), aged below 18 off-0.99. Signal values were then set to a threshold level of years or pregnant. Patients were recruited between September 10, log 2 transformed, and per-chip normalised using 75th 2009 and March 2012. Patients were recruited before com percentile shift algorithm. Next per-gene normalisation was mencing treatment unless otherwise Stated. This study was applied by dividing each messenger RNA transcript by the approved by the Central London 3 Research Ethics Commit median intensity of all the samples. All statistical analysis tee (09/H071674), and Ethical permission from CPP Sud-Est was performed after this stage. Raw microarray data has been IV, France, CCPPRB, Pitié-Salpétriere Hospital, Paris. All deposited with GEO (Accession number GSE). All data col participants gave written informed consent. lected and analysed in the experiments adhere to the Minimal I0086 IFNY release assay testing. The QFT M. tubercuso Information About a Microarray Experiment (MIAME) sis antigen specific IFN-gamma release assay (IGRA) Assay guidelines. (Cellestis) was performed according to the manufacturers 0090 Data analysis. GeneSpring 11.5 was used to select instructions. transcripts that displayed expression variability from the 0087 Gene expression profiling.3 ml of whole blood were median of all transcripts (unsupervised analysis). A filter was collected into Tempus tubes (Applied Biosystems/Ambion) set to include only transcripts that had at least twofold by standard phlebotomy, vigorously mixed immediately after changes from the median and present in a 10% of the samples. collection, and stored between -20 and -80° C. before RNA Unsupervised analysis was used to derive the 3422-tran US 2015/03 15643 A1 Nov. 5, 2015

Scripts. Applying a non-parametric statistical filter (Kruskal method, kit, reagent, or composition of the invention, and Wallistest with a FDR (Benjamini Hochberg)=0.01), after the Vice versa. Furthermore, compositions of the invention can be unsupervised analysis, generated the 1446-transcript and used to achieve methods of the invention. 1396-transcript signatures. The two signatures differed only 0096. It will be understood that particular embodiments in which groups the statistical filter was applied across; 1446, described herein are shown by way of illustration and not as five groups (TB, sarcoidosis, pneumonia, lung cancer and limitations of the invention. The principal features of this controls) and 1396, six groups (TB, active sarcoidosis, non invention can be employed in various embodiments without active sarcoidosis, pneumonia, lung cancer and controls). departing from the scope of the invention. Those skilled in the 0091 Differentially expressed genes for each disease art will recognize, or be able to ascertain using no more than were derived by comparing each disease to a set of controls routine experimentation, numerous equivalents to the specific matched for ethnicity and gender within a 10% difference. procedures described herein. Such equivalents are considered GeneSpring 11.5 was used to select transcripts that were>1.5 to be within the scope of this invention and are covered by the fold different in expression from the mean of the controls and claims. statistically significant (Mann Whitney unpaired FDR (Ben 0097 All publications and patent applications mentioned jamini Hochberg)=0.01). Comparison Ingenuity Pathway in the specification are indicative of the level of skill of those Analysis (IPA) (Ingenuity Systems, Inc., Redwood, Calif.) skilled in the art to which this invention pertains. All publi was used to determine the most significant canonical path cations and patent applications are herein incorporated by ways associated with the differentially expressed genes of reference to the same extent as if each individual publication each disease relative to the other diseases (Fisher's exact FDR or patent application was specifically and individually indi (Benjamini Hochberg)=0.05). The bottom X-axis and bars of cated to be incorporated by reference. each comparison IPA graph indicated the log(p-value) and the 0098. The use of the word “a” or “an when used in con top X-axis and line indicated the percentage of genes present junction with the term “comprising in the claims and/or the in the pathway. specification may mean “one but it is also consistent with 0092 Molecular distance to health (MDTH) was deter the meaning of"one or more.” “at least one.” and “one or more mined as previously described (12), and then applied to dif than one.” The use of the term 'or' in the claims is used to ferent transcriptional signatures. Transcriptional modular mean “and/or unless explicitly indicated to refer to alterna analysis was applied as previously described (12). The raw tives only or the alternatives are mutually exclusive, although expression levels of all transcripts significantly detected from the disclosure supports a definition that refers to only alter background hybridisation were compared between each natives and “and/or.” Throughout this application, the term sample and all the controls present in that dataset. The per “about is used to indicate that a value includes the inherent centage of significantly expressed genes in each module were variation of error for the device, the method being employed represented by the colour intensity (Student t-test, p<0.05), to determine the value, or the variation that exists among the red indicates overexpression and blue indicates underexpres study Subjects. Sion. The mean percentage of significant genes and the mean 0099. As used in this specification and claim(s), the words fold change of these genes compared to the controls in speci “comprising (and any form of comprising, such as "com fied modules were also shown in graphical form. MDTH and prise' and "comprises”), “having (and any form of having, modular analysis were calculated in Microsoft Excel 2010. such as “have and “has'), “including' (and any form of GraphPad Prism version 5 for Windows was used to generate including, such as “includes and “include’) or “containing the graphs. (and any form of containing, such as “contains and “con 0093. Differentially expressed genes between the Training tain’) are inclusive or open-ended and do not exclude addi Set TB patients and active sarcoidosis patients were derived tional, unrecited elements or method steps. using the non-parametric Significance Analysis of Microar 0100. The term “or combinations thereof as used herein rays (q<0.05) and >1.5 fold expression change. Class predic refers to all permutations and combinations of the listed items tion was performed within GeneSpring 11.5 using the preceding the term. For example, "A, B, C, or combinations machine learned algorithm support vector machines (SVM). thereof is intended to include at least one of A, B, C, AB, The model was built using sample classifiers TB or not AC, BC, or ABC, and if order is important in a particular TB. The SVM model should be built in one study cohort and context, also BA, CA, CB, CBA, BCA, ACB, BAC, or CAB. run in an independent cohort to prevent over-fitting the pre Continuing with this example, expressly included are combi dictive signature. This was possible for all the cohorts from nations that contain repeats of one or more item or term, Such our study. Where the study cohorts used a different microar as BB, AAA, AB, BBC, AAABCCCC, CBBAAA, ray platform the SVM model had to be re-built in that cohort. CABABB, and so forth. The skilled artisan will understand To reduce the effects of over-fitting the same SVM param that typically there is no limit on the number of items or terms eters were always used. The kernel type used was linear, in any combination, unless otherwise apparent from the con maximum iterations 100,000, cost 100, ratio 1 and validation text. In certain embodiments, the present invention may also type N-fold where N=3 with 10 repeats. include methods and compositions in which the transition 0094 Univariate and multivariate regression analysis phrase “consisting essentially of or "consisting of may also were calculated using STATA 9 (StataCorp 2005. Stata Sta be used. tistical Software: Release 9. College Station, Tex.; StataCorp 0101. As used herein, words of approximation such as, LP). In the multivariate regression analysis where there were without limitation, “about”, “substantial” or “substantially missing data points (serum ACE and HRCT disease activity) refers to a condition that when so modified is understood to to prevent list-wise deletion dummy variable adjustment was not necessarily be absolute or perfect but would be considered used. close enough to those of ordinary skill in the art to warrant 0095. It is contemplated that any embodiment discussed in designating the condition as being present. The extent to this specification can be implemented with respect to any which the description may vary will depend on how great a US 2015/03 15643 A1 Nov. 5, 2015 36 change can be instituted and still have one of ordinary skilled 01.07 5. Berry M. P. Graham C M, McNab F W, Xu Z, in the art recognize the modified feature as still having the Bloch SA, Oni T, Wilkinson KA, Banchereau R, Skinner required characteristics and capabilities of the unmodified J. Wilkinson RJ, Quinn C, Blankenship D, Dhawan R, feature. In general, but Subject to the preceding discussion, a Cush JJ, Mejias A. Ramilo O. Kon O M, Pascual V, numerical value herein that is modified by a word of approxi- Banchereau J. Chaussabel D, O'Garra A. An interferon mation such as “about may vary from the stated value by at inducible neutrophil-driven blood transcriptional signature least +1, 2, 3, 4, 5, 6, 7, 10, 12 or 15%. oios"." bytes Nature E""E J. A 0102 All of the compositions and/or methods disclosed b. Pascual V, aussabel L. Banchereau J. and claimed herein can be made and executed without undue Revgenomic Immunol approach 2010: to 28:535-571 human autoimmune diseases. Annu experimentation in light of the present disclosure. While the 0109 7. Koth L L Solberg O D Peng J C, Bhakta NR compositions and methods of this invention have been Nguyen C P. WoodruffP G. Sarcoidosis blood transcrip. described in terms of preferred embodiments, it will be appar- tome reflects lung inflammation and overlaps with tuber ent to those of skill in the art that variations may be applied to culosis. Am J Respir Crit Care Med2011; 184:1153-1163. the compositions and/or methods and in the steps or in the (0.110) 8. Maertzdorf J. Weiner J, 3rd, Mollenkopf H J, sequence of steps of the method described herein without Bauer T. Prasse A, Muller-Quernheim J. Kaufmann S H. departing from the concept, spirit and scope of the invention. Common patterns and disease-related signatures in tuber All Such similar substitutes and modifications apparent to culosis and sarcoidosis. Proc Natl Acad Sci USA 2012; those skilled in the art are deemed to be within the spirit, 109:7853-7858. Scope and concept of the invention as defined by the appended 0111 9. WASOG. Consensus conference: Activity of sar claims. coidosis. Third wasog meeting, los angeles, USA, Sep. 8-11, 1993. Eur Respir J 1994; 7:624-627. REFERENCES (O112 10. Pankla R, Buddhisa S. Berry M, Blankenship D M. Bancroft G. J. Banchereau J, Lertmemongkolchai G, (0103) Rio sobal tuberculosis control. World health Chaussabel D. Genomic transcriptional profiling identifies Organ1Sauon. a candidate blood biomarker signature for the diagnosis of (01.04] 2. Newman LS, Rose CS, Bresnitz EA, Rossman M septicemic melioidosis. Genome Biol 2009; 10:R127. P. Barnard J. Frederick M. Terrin ML Weinberger SE, 0.113 11. Guiducci C, Gong M, Xu Z, Gill M, Chaussabel Moller D R, McLennan G. Hunninghake G. DePalo L. D, Meeker T, Chan J. H. Wright T. Punaro M, Bolland S. Baughman RP. lannuzzi MC. Judson MA, Knatterud GL, Soumelis V. Banchereau J. Coffman R L. Pascual V. Barrat Thompson B W, Teirstein A S. Yeager H, Jr., Johns C J, F.J. Tir recognition of self nucleic Rabin D L. Rybicki B A, Cherniack R. A case control 0114 12. Bloom C I. Graham C M, Berry M P Wilkinson etiologic study of sarcoidosis: Environmental and occupa- KA, Oni T. Rozakeas F. Xu Z, Rossello-Urgell J. Chaussa tional risk factors. Am J Respir Crit Care Med 2004; 170: bel D Banchereau J Pascual V. Lipman M Wilkinson RJ 1324-1330. - 0 O'Garra A. Detectable changes in the blood transcriptome 0105 3. Iannuzzi MC, Rybicki BA, Teirstein A S. Sar- are present after two weeks of antituberculosis therapy. coidosis. N Engl J Med 2007: 357:2153-2165. PLoS One 2012; 7:e46191. 0106 4. Anderson SR, Maguire H. Carless J. Tuberculosis 0115 13. Oliveros, J. C. (2007) VENNY. An interactive in london: A decade and a half of no decline corrected. tool for comparing lists with Venn Diagrams. bioinfogp. Thorax 2007; 62:162-167. cnb.csic.es/tools/venny/index.html.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS : 1451

<21 Os SEQ ID NO 1 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs SEQUENCE: 1 ttgtagattg gagaacacca totaagagct actaagctica ccacct tcc c SO

<21 Os SEQ ID NO 2 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs SEQUENCE: 2

agaatgagag cacacagac gttaggcatt to ctgctgaa cqtttc.ccc.g SO US 2015/03 15643 A1 Nov. 5, 2015 37

- Continued

<210s, SEQ ID NO 3 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 3 gctggcgggg aaccctggga gtagctagtt totttittgc gtacacagag SO

<210s, SEQ ID NO 4 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 4 ccacca acct gtatat caag cct cottgcc ccacaaagct tccaaag.ccc SO

<210s, SEQ ID NO 5 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 5 actt Catctt CCC caagtgc ggggagtaca aggcatggcg tagagggtgc SO

<210s, SEQ ID NO 6 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 6

Cactggagtg atgttgctga C cago.cgttt cct gtacctic tictaagttgg SO

<210s, SEQ ID NO 7 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OO > SEQUENCE: 7 gtcCtcaagt to ct cocc aggcc cagca acggcct ct c goggcc titt SO

<210s, SEQ ID NO 8 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 8 ggagttcaag taggaatg Ctggctttga gcc ct ctaca Ctgctggttg SO

<210s, SEQ ID NO 9 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 38

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 9 tgtc.ccagcc agacittacag aattgggg.tc tdtat cittaa caagaac ccc SO

<210s, SEQ ID NO 10 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 10 cggaggcctg togttctgtcc actittgcc.gt ttaat acccg atgagaatcc SO

<210s, SEQ ID NO 11 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 11 ttgagccacg cat agtgtca cqc acctgtg atcc.ca.gcta Cttaggaggit SO

<210 SEQ ID NO 12 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 12 tccaacgcat CCCtgctgat C cagaatgtc. acccggalagg atgcaggaac SO

<210s, SEQ ID NO 13 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 13 aggcagtttic cqgaalacagt gaccctgaga aggct tcctic ctgagtatgc SO

<210s, SEQ ID NO 14 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 14 cagttctgcatctgat accq t ct cottt co ctgaagtctg gcacaccatg SO

<210s, SEQ ID NO 15 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 39

- Continued

<4 OOs, SEQUENCE: 15 gCatctgcgt atcgtgggat aattgacatg agggcttgag agaactic cag SO

<210s, SEQ ID NO 16 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 16 tgcatc cata catggcaaa gt catc.ccca ggalacacaga ggggtgcctg SO

<210s, SEQ ID NO 17 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 17 agcgct attic cctggagact agggttgtcc taatggc.cg agaggaalacc SO

<210s, SEQ ID NO 18 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 18 gcct Ctgcag acc caagt cq t cctgaagtic ggcct ct coa ggcc.ca.gctic SO

<210s, SEQ ID NO 19 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 19 citcaatcc td aacatc tagg ctggalacctg. cacacct coc cct cagotcc SO

<210s, SEQ ID NO 2 O &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 2O

Cagtggcctic tittaggcc aa gCt catgctt cacaagggcc titt coaggct SO

<210s, SEQ ID NO 21 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 21 gggagt cact tatgctitt C aggttaatca gagctatggg tectacaggc SO US 2015/03 15643 A1 Nov. 5, 2015 40

- Continued

<210s, SEQ ID NO 22 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 22 tctt actctg. Cagcaa.catg gaggagagtt ttgttgtagtg agtgtgggcg SO

<210s, SEQ ID NO 23 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 23 gaagttctaa gacattgatg tdgccaaggit Caacaccgtg atttggcctg SO

<210s, SEQ ID NO 24 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 24 gtc.tttittgc cagaagittaa aggctgtc.t.c caagt cc ctd aacticagoag SO

<210s, SEQ ID NO 25 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 25 cittct c cct c caccaccc cc agt cqtcago toctitcc citc atttatttitt SO

<210s, SEQ ID NO 26 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 26 tgct ct coat cctggat.cgt aaggaggcat cat caggctg tdttic ctgga SO

<210s, SEQ ID NO 27 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 27 gaatctg.ccg ccc.ctic catc ttctacct ct gaatggccac cct tag accc SO

<210s, SEQ ID NO 28 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 41

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 28 aggc cattta ataaaaaggg cacaaagcct gtcagagttt toaacggtgc SO

<210s, SEQ ID NO 29 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 29 t cacactic ct gggctctgaa cacacacgcc agctic ct citc tdaag.cgact SO

<210s, SEQ ID NO 3 O &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 30 ttgagatgtt cctgacggcc aaagaggtgg aggagtic cct ggagaggcgt. SO

<210 SEQ ID NO 31 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 31 citt.ccct citt ggctg.ccc cc agg tatttac tdtggagaac attgcatagg SO

<210s, SEQ ID NO 32 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 32 gcatcaac cc tdtgct citat gcago catga accoccaatt cogccaa.gca SO

<210s, SEQ ID NO 33 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 33 gctgcttitta ggittalaccac aaaggaacaa Ctcaggat.ca gtcgtgattg SO

<210s, SEQ ID NO 34 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 42

- Continued

<4 OOs, SEQUENCE: 34

Ctcgittagca actgcgagca gaagctitt Co. agt cacacat Caaggcgctt SO

<210s, SEQ ID NO 35 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 35

Cctctgct ct tdaagc.ccg to agacic cc acaataaaga ataag catgg SO

<210s, SEQ ID NO 36 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 36 cc.cgtggaca CCC cagacct gcgaaggatg atcgc.ccgat aaagacggat SO

<210s, SEQ ID NO 37 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 37

Cctcgggcag cagaga.gcag atgaalacc cc catgtggtag gCagggttgg SO

<210s, SEQ ID NO 38 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 38 aagctict cct ggaccttgga ct cacagctg acc cc cagtg ggcagttcca SO

<210s, SEQ ID NO 39 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 39 cc.ca.gc.ccct gaatgaggaa ggat.ca.gaga agaggct act gggggagaat SO

<210s, SEQ ID NO 4 O &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 4 O agagtic ctag ttgttgcc ct accctggctic aggct tctgg gct ctgagaa SO US 2015/03 15643 A1 Nov. 5, 2015 43

- Continued

<210s, SEQ ID NO 41 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 41 acticgtg tott to9 a toccotic99 to9 citctgttittagtg a catccgtg tect tott at at 99 SO

<210s, SEQ ID NO 42 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 42 ggagcacct C Caggcttgca aagaaagtga ggcct Cttgg tat cottt CC SO

<210s, SEQ ID NO 43 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 43 aactgttgga ggagaa.gctgaaggaggctg aga.ccc.gagc agagtttgcc SO

<210s, SEQ ID NO 44 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 44 t caccagacic ttggaccaga C cctgctgga act caacaac Ctgtgagggc SO

<210s, SEQ ID NO 45 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 45 c catccaccc attct cagca gactic cagta ttggcacagt cact cactgc SO

<210s, SEQ ID NO 46 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 46 tittgcaaagg gccalaatttic cccaaactga acgggct cag gaaatgttcC SO

<210s, SEQ ID NO 47 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 44

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 47 ggatggagaa t cctggctgc catgtggttg atgc.ca.gc.cc Ctc.ca.gagaa SO

<210s, SEQ ID NO 48 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 48 t ccctggagt acgggaaggc tigagctggag attcagaaag acgcc ctgga SO

<210s, SEQ ID NO 49 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 49 catcctgatt gtgctgctgg togtotttct c cct cagagc agtgacagoa SO

<210 SEQ ID NO 5 O &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 50

C CactCctgg ggggaatctg gt caccctga gctgttgaaac aaagttgctic SO

<210s, SEQ ID NO 51 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 51 tccagoggtc. tcggcact co colt cacacct c ccaagatga agct caatga SO

<210s, SEQ ID NO 52 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 52 gcct coagtg cct tcc.ccc.g tdgaataaac ggtgtgtc.ct gagaalaccac SO

<210s, SEQ ID NO 53 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 45

- Continued

<4 OOs, SEQUENCE: 53 tgggg.c ctitc acctgctic catccagaacat cagcttct co to citt cactic SO

<210s, SEQ ID NO 54 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 54 agctgctgca agacagcaat gacagtccac ctg.ccggcct gattic ctgca SO

<210s, SEQ ID NO 55 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 55 gcaa.ca.gcgg gttittgcaga cqct Cttctic cagc.cggagc tigggactgtt SO

<210s, SEQ ID NO 56 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 56 cctgggtc.ca cca.gc.ccctd acgc.ccct git gcc cactittg taaataaact SO

<210s, SEQ ID NO 57 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 57 acattatgct c ctacctic cc ggcagcatct c caggcc cag aactittcticc SO

<210s, SEQ ID NO 58 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 58 taagat catgtcactgcact c cagactgag caacagagtg acactittata SO

<210s, SEQ ID NO 59 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 59 aagctitt cag aatatgtcag togctgatgta gcatgcttgt togcaattgcc SO US 2015/03 15643 A1 Nov. 5, 2015 46

- Continued

<210s, SEQ ID NO 60 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 60 ggagcacgga tictaccacac cattgcatat ttgacacccc titc.cccago c SO

<210s, SEQ ID NO 61 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 61 acaactggcc gggcat agtic agtgtcc.cag cac catgtca gitatgct cac SO

<210s, SEQ ID NO 62 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 62 acgcagaccc cacact coat ggagtctgga atgcatggga gctgc.cccCC SO

<210s, SEQ ID NO 63 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 63 ttcttgacaa citctgctic ct ttgggg.ctgg c tact actgc agg to tccag SO

<210s, SEQ ID NO 64 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 64 acagct attic C catatt cta ggagtggcct aagaaatgcg tdttt cagtg SO

<210s, SEQ ID NO 65 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 65 citct tcc.gcg acaatt cata t cittgtacgt catcagagat tt catgc.cgg SO

<210s, SEQ ID NO 66 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 47

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 66 tgcatcatgt gct cot acco togctic taccg cittittctggg to acagaggc SO

<210s, SEQ ID NO 67 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 67 gagggtgatg attgggtgtt catacgcttgttgagatgt gcc acccttg SO

<210s, SEQ ID NO 68 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 68 ctgtc.cgt.ct cqaatgactg gagtttcc tig cittctgtcac tacacct coc SO

<210 SEQ ID NO 69 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 69 gggg talacac agagtgcc ct tatgaaggag ttggagatcc ticaaggaag SO

<210s, SEQ ID NO 70 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OO > SEQUENCE: 7 O gCaggaacaa Cagatgcagg aac aggctgc acagcticago acaac attic C SO

<210s, SEQ ID NO 71 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 71 gtcCaggatt gcct cacttg agaCttgcta ggcct ctgct gtgtgctggg SO

<210s, SEQ ID NO 72 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 48

- Continued

<4 OOs, SEQUENCE: 72 gtaactggaa actgtgttgc tictaaccotc ctic cagocct gcago ct coc SO

<210s, SEQ ID NO 73 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 73 ggaacacctic Ctcaaaccta C cact cagga atgtttgctg gggc.cgaaag SO

<210s, SEQ ID NO 74 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 74

CtgagcCagg atgtgcaata galagcagggg ttgtc.ccctic ticcagatct SO

<210s, SEQ ID NO 75 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 75 aggaagc.cag gCaccgatag agc acccagc cccaccc.ctg taaatggaat SO

<210s, SEQ ID NO 76 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 76 tactgaaaga Cttittgccta aagtggcatt attgactgct ggtgttgatga SO

<210s, SEQ ID NO 77 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OO > SEQUENCE: 77 gaggct ct ct tdtgagct ct togg accct co tott cacgga cccaactgtg SO

<210s, SEQ ID NO 78 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 78 agcggcgCCC gttc.cggc acagacggcc ticggctCca gigggg.cggag SO US 2015/03 15643 A1 Nov. 5, 2015 49

- Continued

<210s, SEQ ID NO 79 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 79 cc cataaaac agggtgttgaa aggcatctica gcggctg.ccc cac catggct SO

<210s, SEQ ID NO 8O &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 80 cgcacacaga tigggatttgg gaataggittt ggittatccala ggagcagtgc SO

<210s, SEQ ID NO 81 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 81 agggcc tigtt agccacctica togc ctitt.ccc tdgaccc cta citgggct cac SO

<210s, SEQ ID NO 82 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 82 atgc.ca.gaca taacatgtag cagccatact tdcatggaaa citgactacac SO

<210s, SEQ ID NO 83 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 83 gaat acttct Cttgctgaga gcc.gatgc cc gtcc.ccgggc cagcagggat SO

<210s, SEQ ID NO 84 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 84 gtcgacct ca ccaggcc.cag ct catgctitc tittgcagcct citccaggcc c SO

<210s, SEQ ID NO 85 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 50

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 85 caaggc cct c titcacactico atc.ttgtagc cccagoagga gct attittct SO

<210s, SEQ ID NO 86 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 86 gggalagatgc cct at Ctctic gggtgctgcc tacaacgtgg Ctgtcatctic SO

<210s, SEQ ID NO 87 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 87 aaac ccgt.ca cccagat.cgt cagcgc.cgag gcc tdgggta gag caggtga SO

<210 SEQ ID NO 88 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 88 ccacatgatc ccactgcaga tigttitttgta acaccagctg aggagaaacc SO

<210s, SEQ ID NO 89 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 89 gccatgctga gagctgggct titcCtctgtg accatc.ccgg cct gtaacat SO

<210s, SEQ ID NO 90 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 90 gctgagagct gggctitt.cct CtgttgacCat C cc.ggcctgt alacat atctg SO

<210s, SEQ ID NO 91 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 51

- Continued

<4 OOs, SEQUENCE: 91 gcc.gc.gcgga cccaacgagc cc.gc.gct cag acgc.cccago tocccdaga SO

<210s, SEQ ID NO 92 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 92 ttgct cacgt ctdtgc catgttgtcaatgg gtc.ctitt coa acccaagagg SO

<210s, SEQ ID NO 93 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 93 tgttct tccc catgtc.ctgg atgcc actgg aagtgcacac togcttgtatg SO

<210s, SEQ ID NO 94 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 94 acgct cqt ct atgcct actg gaccatgtga gcc tiggcact tcc ccacaac SO

<210s, SEQ ID NO 95 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 95 c cct ggaaag citcc.ccgaca acctic cactg ccattaccca citaggcaagt SO

<210s, SEQ ID NO 96 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 96 gaggacgt.cc cqgctgggat galagtctggit ggtgggit cqt aagtttagga SO

<210s, SEQ ID NO 97 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 97

Ccagtgggct Ctggcgc.cta tigctctgttgttgctgctt ttgacacaaa SO US 2015/03 15643 A1 Nov. 5, 2015 52

- Continued

<210s, SEQ ID NO 98 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 98 tccaaggat.c agc cct gaga gcaggttggit gactittgagg agggcagt cc SO

<210s, SEQ ID NO 99 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 99 citccggtgag ccatggagcc cct cittatct gggcc caacc ticaaataacc SO

<210s, SEQ ID NO 100 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 1.OO aactgtggga CCCCtt Caga t t c cctgagg tatggcttgg to act ct cag SO

<210s, SEQ ID NO 101 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 101 ggtcaaatgg ggt cacacag aatactaaga gctgttcacc Cagacct cac SO

<210s, SEQ ID NO 102 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 102

Cttcagct ct gctgttgggc tiggtgttgttgg acagaaggaa tigaaagcca SO

<210s, SEQ ID NO 103 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 103 citcc ttgcca gcc ccacc cc caggtttittg cc catcc toc caatctoraat SO

<210s, SEQ ID NO 104 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 53

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 104 tgacctggat Ctgacct cac accat cagca gggggcaccc accatgcaca SO

<210s, SEQ ID NO 105 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 105 t caggagaga agaagaggaa acaca agagg aaacgc.ctgg to agagcc.c SO

<210s, SEQ ID NO 106 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 106 gcatggaggc aggtgttt CC aagggtgtct Catta actgt agctgcaaag SO

<210 SEQ ID NO 107 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 107 tgggtgcaac Ctctgcggca gcc.gactgca talaccalaag ttctic acct a SO

<210s, SEQ ID NO 108 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 108

CtcCaacacc gtc.cagttgt ggcagctict c Cagaagtaat agcagctgac SO

<210s, SEQ ID NO 109 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 109 ttgc.ca.gagt tttgcctgct gctitt cotcg tdgcct Cttic ttggg tagtg SO

<210s, SEQ ID NO 110 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 54

- Continued

<4 OOs, SEQUENCE: 110 aagtttgttgg gggccaaacc tiggaccc.cg agctt cotcg gtagcagagg SO

<210s, SEQ ID NO 111 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 111 c catggtgat ggatggitttg gaaagggaat gttggtgcct tttgttgccac SO

<210s, SEQ ID NO 112 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 112 acatggtgag Caggitatic ct atgtc.ccgag gaagcc catt ccacctgagc SO

<210s, SEQ ID NO 113 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 113 gaaagcagga ct cacagaac Citgaagt cac ccagact coc agc catcagg SO

<210s, SEQ ID NO 114 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 114

Ccagccaaga aggcagccac act agcagag ggtgacalagg acaatgacat SO

<210s, SEQ ID NO 115 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 115 gCacctgcag aggcaggctt cagagcticca C cagc catca atgcc aggct SO

<210s, SEQ ID NO 116 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 116 gagggcgctg catgtgctgg gtacatttct tccacttgg gaatcagtag SO US 2015/03 15643 A1 Nov. 5, 2015 55

- Continued

<210s, SEQ ID NO 117 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 117 gccCtgtgtt cctggagc at ttgttgaccg C caactgaca acatgctagg SO

<210s, SEQ ID NO 118 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 118 cc.gc.ct caca agtgcgaagt ctggtaagga caag.cggag agagc.calaga SO

<210s, SEQ ID NO 119 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 119 tacatgactic agcatacctg. Ctggtgcaga gctgaagatt ttggagggit c SO

<210s, SEQ ID NO 120 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 120

Cttgagagat gag caccagt tacacaagga Cttct titatic cagcgctgg SO

<210s, SEQ ID NO 121 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 121 c cccaactitc gcc ctdcc.ca cittgact tca ccaaatc cct tcc toggagac SO

<210s, SEQ ID NO 122 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 122 t cagactic cc togccaccttt toccctgggit totg.ccgt.ct togcct cactt SO

<210s, SEQ ID NO 123 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 56

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 123 gtatactctg gaaggcagtic cctggagcca gtgcc aggcg gatgacagat SO

<210s, SEQ ID NO 124 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 124 gcc aggcgga tigacagatgg gaccct cotic tigccaaatg tacct ct cqt SO

<210s, SEQ ID NO 125 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 125 ctgttgtacaa accc.cccaaa gtgtc.cctga tigc.ca.gcaac ccttgcacgg SO

<210 SEQ ID NO 126 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 126 aggctcgggg gtCCCC gCdt CCC aggcc.ca giggggatggg gttcgc.gaga SO

<210s, SEQ ID NO 127 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 127 tggccacagg agctgaaagc agaagagtgg gatttgatgc Caggcagtgg SO

<210s, SEQ ID NO 128 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 128 gcccacct ca ggcagacaca gag cacaatg Ctggggttct Ctt cacact a SO

<210s, SEQ ID NO 129 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 57

- Continued

<4 OOs, SEQUENCE: 129 acatgagctic agagct acco cacaccitt cq gactgcct cq gcc cc cacag SO

<210s, SEQ ID NO 130 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 130 gacgtggttc atgtgaaaga tigc.ca.gtggc alacagatttg C cact cacc SO

<210s, SEQ ID NO 131 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 131 catgggctitt ggit ctittittg actaalacctic ttittataa.ca togttcaataa SO

<210s, SEQ ID NO 132 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 132 tgggaccagc agcacaagtt C cctgtct tc atggggggag tatatgaccC SO

<210s, SEQ ID NO 133 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 133

Cacaagatgg cqctgaaagc aaagaaggaa got cotgcct ct cctgaagc SO

<210s, SEQ ID NO 134 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 134 tctagggcta gtact tagtt t cacaccc.gg gagctgggag aaaaaac ctg SO

<210s, SEQ ID NO 135 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 135 gaattgggca gct catgctt gaga.cccalat Ctc catgatg acctacaagc SO US 2015/03 15643 A1 Nov. 5, 2015 58

- Continued

<210s, SEQ ID NO 136 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 136 ggaalatatga gattacggag cagcgcaaga ttgat Cagaa agctgtggac SO

<210s, SEQ ID NO 137 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 137

Ccagtggaaa taccgggaac C caaggaccg gagcgaatga ggcctgt at C SO

<210s, SEQ ID NO 138 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 138

Ctttctgctt gcattgcc.gt atctgtgcgt. tcc tt catcc togg to ctggc SO

<210s, SEQ ID NO 139 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 139 tgcagagaag aaagtgaggc cgggagcctg agcctgggct ggagc ctitct SO

<210s, SEQ ID NO 140 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 140 gccatgtcac cagcc cc at tatt Cocag agggit ct tag ticctggaaag SO

<210s, SEQ ID NO 141 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 141 agcc.gcaatt gttctgaaaa tdtcaaacga gqcttctgtt ttgcacctgc SO

<210s, SEQ ID NO 142 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 59

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 142 catgcct citg togc ctitcqct catgctgttt citt cogactg gaatgcc titc SO

<210s, SEQ ID NO 143 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 143 ggat CCtgtt gacaccc.caa acc caacaag gaggaagcct gggaagtgcc SO

<210s, SEQ ID NO 144 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 144 t catgagatc tdgttttittgaaagtgtgtg gcaagtic ccc ctitcgct ct c SO

<210 SEQ ID NO 145 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 145 ggcc cct ctg taaaacatg cccagcggca aggccaacaa gotgggggat SO

<210s, SEQ ID NO 146 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 146 gctgcaggct aac agggagc ccgactitcag cagcctggtg tcaccitctica SO

<210s, SEQ ID NO 147 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 147 gcaagaacac ttctggalagg gagagtggat ttggctgggc Ctctggatgg SO

<210s, SEQ ID NO 148 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 60

- Continued

<4 OOs, SEQUENCE: 148 ctacaggit co cct ctdagcc ct ct caccitt gtc.ctgtgga agaagicacag SO

<210s, SEQ ID NO 149 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 149 gcc.ccgt.ctic gtgggatt at tatt Cotga t ccaagaagg gtcCtctggg SO

<210s, SEQ ID NO 150 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 150 cCaggcc.caa gtcgt.cctica agt cqgcctg. galagtgggcc tigaagagct SO

<210s, SEQ ID NO 151 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 151 ggtgagagag aaggaalacct togaggagg acatt actgg ttgttctggc SO

<210s, SEQ ID NO 152 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 152 tggatggcac cagaggctgc agaaggccaa gaatcaa.gct agaaggccac SO

<210s, SEQ ID NO 153 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 153 gctgacgt at tt catggcag ticaagtccaa tigcagcgt.c titcgt.ccggg SO

<210s, SEQ ID NO 154 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 154 cc.ccgttitat coatgtgtcc attgacggcc atctatottg cittctitcggc SO US 2015/03 15643 A1 Nov. 5, 2015 61

- Continued

<210s, SEQ ID NO 155 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 155 cgggaggcac ggcc.gagatg tacacgaaga Caggagt caa tigagattct SO

<210s, SEQ ID NO 156 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 156 gcggaagagg agcc.gctgga accalagacac aatggaacag alagacagtga SO

<210s, SEQ ID NO 157 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 157

Cctgagtaac tectgct tc aggat cagct tcagagtct tcc ttittagg SO

<210s, SEQ ID NO 158 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 158 gtgggtctga agagcttggit to agaaact tcggggtcta Calaaggcagg SO

<210s, SEQ ID NO 159 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 159 gtttggacct gggcgtctgagggc.cccact ttgggaaccg ttgaaatagg SO

<210s, SEQ ID NO 160 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 160 gatctgtcac titt citcc.cat cacgcticagg taggaccatc cagttgcaga SO

<210s, SEQ ID NO 161 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 62

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 161

CCCC caacaa agactgtgca ttcaatacct taatggaact caggtggag SO

<210s, SEQ ID NO 162 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 162 agtgtc.ct cq a catcc Cagg ggaaagcaag agcagtgagc Ctgagcagtg SO

<210s, SEQ ID NO 163 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 163 agcaat caca aagc.ca.gaga agctgtaagc tigcctg.ccgg gcctgaggag SO

<210 SEQ ID NO 164 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 164 gagdtggctt Caggggaagit gct attcaca ggaccat atc. Caccaccct c SO

<210s, SEQ ID NO 165 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 165 ggagggacaa gaatcaatgg ataag.cgtgg gtggaggaag atccaaacag SO

<210s, SEQ ID NO 166 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 166 c taggctato ggtgctgctt ctd.cccact t t caggagaac cct gctctgc SO

<210s, SEQ ID NO 167 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 63

- Continued

<4 OOs, SEQUENCE: 167

Cagttittgcc tccagtggaa gCagaaaggg tttitt to agc tigittaaatcC SO

<210s, SEQ ID NO 168 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 168 aggccaagga cc.gcagtic ct t cagtaacac cagtgtaaaa gcttgaggag SO

<210s, SEQ ID NO 169 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 169 gcct coaaga aatggggotg actittgc.cat catggagatg aagaa.gctgc SO

<210s, SEQ ID NO 170 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 170 t catt.cgacg tdgttcgtgt gaaagatgcc aatggcaa.ca gCtttgcc.gc SO

<210s, SEQ ID NO 171 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 171 titcc tdggag accatcc.ccc acctittct co ggcct citgag actittgagtic SO

<210s, SEQ ID NO 172 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 172 actggcaaag gct aggc.ccg gag cacccta ggcgctggat tittgggacala SO

<210s, SEQ ID NO 173 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 173

Ctgggccaga atttcaaacg gcct cactag gct tctggitt gatgcctgtg SO US 2015/03 15643 A1 Nov. 5, 2015 64

- Continued

<210s, SEQ ID NO 174 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 174

Cctic cagtgg tttaggcagg accctgggaa aggtot caca totctgttgc SO

<210s, SEQ ID NO 175 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 175 ggcc attgta gcc citctgtg togttcacago catcc tdtac togcattcc.gc SO

<210s, SEQ ID NO 176 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 176 gttaaatgcg gtgtagcaaa gtt atggggit Ctgcttgagg gCact aacct SO

<210s, SEQ ID NO 177 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 177 atgagtgtta taccct aaga gtctgaggag ttgggaagit gct tcgtgtg SO

<210s, SEQ ID NO 178 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 178 gagg.cgc.cag galaccCtcct cctgtcaatg ggggtgtagt atttittgc.ca SO

<210s, SEQ ID NO 179 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 179

CCCCttgaga atggtgat Co acc cagttac aggggcattt agggagcaga SO

<210s, SEQ ID NO 18O &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 65

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 18O ttgcCagggc gaggcggc cc caatgacaga titt coctitta gacccagcag SO

<210s, SEQ ID NO 181 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 181 aggcgacggg aag.cgcgggt get C9gctgg gtCcggctic Ctggagaa.ca SO

<210s, SEQ ID NO 182 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 182 gcatcCtcct gtgtatggala gagacaggtg accgctic cag gttgggtgct SO

<210 SEQ ID NO 183 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 183 gagtgtatag ccc catcttg togg taacttg ctgcttctgc actt catatic SO

<210s, SEQ ID NO 184 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 184 agcagotcac ccaggagcta gacaccct co goalacct citt cc.gc.cagatt SO

<210s, SEQ ID NO 185 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 185 cgtgttct ct ttaagaacac togtgaaact gcc caggcca toaagggitat SO

<210s, SEQ ID NO 186 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 66

- Continued

<4 OOs, SEQUENCE: 186 ggaacacagg caaaccc.gtg attittggtgc ticcittgtaac toagc cctgc SO

<210s, SEQ ID NO 187 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 187 gggaatgagt acatggtcct togcct cocca toagtgcctic ttctic caact SO

<210s, SEQ ID NO 188 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 188 cCaggc.cgac atgactaagt atctggalagg ct cactgtac CccagcaccC SO

<210s, SEQ ID NO 189 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 189 ggtggacgac ttgttacag ccttggctgc gct agtagct gcc titt catg SO

<210s, SEQ ID NO 190 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 190 gtcatCcagc aatgagagaa t cctgcct ct gtagaccaac atc.ca.gtgtg SO

<210s, SEQ ID NO 191 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 191 gaggctggat gaaatgtcag Ctagggcc at tittggctgct gaggctctgg SO

<210s, SEQ ID NO 192 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 192 cittgc.cgatg citcc.ca.gctgaataaag.ccc tdocttctac aacticagtgt SO US 2015/03 15643 A1 Nov. 5, 2015 67

- Continued

<210s, SEQ ID NO 193 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 193 gcac catgca tdgagt cago cattt citcta ggaac cittga titcctgtctg SO

<210s, SEQ ID NO 194 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 194 gcaagaaagt cta acctatt ccggtgttct citctic ccatg agacaag.ccg SO

<210s, SEQ ID NO 195 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 195 tgat Cogaag gaggagtggc gctgggcgct ggacticgctg gtgttgaaaat SO

<210s, SEQ ID NO 196 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 196 c cccacgcct gtttgt attg ggagctctgg accaatagtg tct ct coltag SO

<210s, SEQ ID NO 197 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OO > SEQUENCE: 197 tgacgagc cc tict cacagag cacggtctga atctgcacag agcaagatgc SO

<210s, SEQ ID NO 198 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 198 atgatacgta aacgct cott tdgalaccca ttcgagggca gcc.ggcggga SO

<210s, SEQ ID NO 199 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 68

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 199 c caaaaaatt gtc.tctggca at agttacct tcc cagatac aggtocc ccc SO

<210s, SEQ ID NO 2 OO &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 2OO t caa.gc.ctag togtaaatttic togcatctoac acgactittag tittggcc agg SO

<210s, SEQ ID NO 2 O1 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 2O1 citat atctgg aagggg.cgaa aag.cgaatga gaaggagcgg Caggcagcc.c SO

<210 SEQ ID NO 202 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 2O2 aatttggc.cg cggttcticgc ticttgtc.gcg tctgctcaaa C cagcacggit SO

<210s, SEQ ID NO 203 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 2O3

C Cactgcctic aagatgcc.ca agat CtcCtg C caaaacaag ggagt cqtgg SO

<210s, SEQ ID NO 204 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 204 tggctic ccaa citcctic ccta t cctaaaggc ccactgg cat taaagtgctg SO

<210s, SEQ ID NO 205 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 69

- Continued

<4 OOs, SEQUENCE: 205 tctgttgttggt gttttgtacc ggcacgggat atggaacgala aactgctttg SO

<210s, SEQ ID NO 206 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 2O6 cagagtggag aggcagaaac catgtgcaga ggctgggaga tigctgctgtt SO

<210s, SEQ ID NO 2 O7 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 2O7 gagggggagg gcttggtggg gagggggact tacaatt cat C cacgctgtt SO

<210s, SEQ ID NO 208 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 208 gggcattgttg Ctggc.cccac titt cactggc Ctt Cttggitt ttggggggaa SO

<210s, SEQ ID NO 209 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 209 ctg.ccct tcc caggtgattic totaagttgt ccct caactg tacttggaga SO

<210s, SEQ ID NO 210 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 210 ggctittattt toactggc cc ctdaggttga agt cagagtic ticcaaaaaac SO

<210s, SEQ ID NO 211 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 211 gtaatagoca aaccocactic togttggtagc aattggcago cct attt cag SO US 2015/03 15643 A1 Nov. 5, 2015 70

- Continued

<210s, SEQ ID NO 212 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 212 gtggcttct c tdtgaattgc ctd taacaca tagtggcttic ticcgc.cc.ttg SO

<210s, SEQ ID NO 213 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 213 ggtgagc.cat ggagcc cct c ttatctgggc cca acct ct c cittaaataac SO

<210s, SEQ ID NO 214 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 214 ccalaggttct cacactggcc tiggcttggg to ccatata ggaggtotgt SO

<210s, SEQ ID NO 215 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 215 gcct cqgaac tdtgacac cc ccatc.ttcga citt cagagtic ttctitccaag SO

<210s, SEQ ID NO 216 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 216 tgacga catc gagtgct tcc titatggagct ggagcagc.cc gcc tagalacc SO

<210s, SEQ ID NO 217 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 217 atgcatgtca ctaagttgtc atc.ccacata aattgatgtg cagcataggg SO

<210s, SEQ ID NO 218 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 71

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 218 ggggct coac acctittgctg togtgttctgg ggcaacctac taatcct ct c SO

<210s, SEQ ID NO 219 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 219 gaaaagctgt gtcgtgttcc ctgttgaaact gag caggtgt gtgttggcgc SO

<210s, SEQ ID NO 220 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 220 tctcaaaacc catggaggga ggctgctggt gtgg taggca galacc taggc SO

<210 SEQ ID NO 221 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 221 acttgagctgaaalacc Caga tiggtgttaac tocc ccc actitt CC ggc SO

<210s, SEQ ID NO 222 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 222 accaggtgca gacacgcagt gctgatgagc C catgactac Ctttgtc.tta SO

<210s, SEQ ID NO 223 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 223 agtgaggctg agatgacaga ggtggit catg gctgg cacag ggctic aggta SO

<210s, SEQ ID NO 224 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 72

- Continued

<4 OOs, SEQUENCE: 224 cCagggactt C9ggggaaat Ctggagcatt ttt tacaagc Ctt coacttic SO

<210s, SEQ ID NO 225 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 225 ggcagocctg gcaaatgaat caaag accca ttcctgttcc tict coccacc SO

<210s, SEQ ID NO 226 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 226 gcattggtct ggccaagttct acaatgtc.cc aatat caagg acaaccaccc SO

<210s, SEQ ID NO 227 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 227 gttctic Caga ggalaggtgga agaalaccatg ggcaggagta ggaattgagt SO

<210s, SEQ ID NO 228 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 228 gatagt ccct tdaacacaca togctg.ccgag citctggaaaa accccacago SO

<210s, SEQ ID NO 229 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 229 c cc catcaac cacacagtgc gacgc.cttgt tdocttcacc titt cacc citt SO

<210s, SEQ ID NO 230 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 23 O gcqcttgttgc ticggatggct tcgcatttico coaat accoc attaaac cqt SO US 2015/03 15643 A1 Nov. 5, 2015 73

- Continued

<210s, SEQ ID NO 231 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 231 tgct Cttcag gtttcaggct Caacaagggg catggtctgc ticttgcagat SO

<210s, SEQ ID NO 232 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 232 cc catctotg C cactittgat gct atttggg ttatgatggg gcaagatggc SO

<210s, SEQ ID NO 233 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 233 ctgacattga acc cagttgc ticagggit cag cccattctta citt coct ggg SO

<210s, SEQ ID NO 234 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 234

Cagc cct cta gcagagcgt.c agtgcagtic tittatc.ccg gCttt tacag SO

<210s, SEQ ID NO 235 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 235 gact cotgcc ccggttcaac cctaccagct tdtggtaact tactgtcaca SO

<210s, SEQ ID NO 236 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 236 caacct gcca cattttggga gcttittctac atgtctgttt tot catctgt SO

<210s, SEQ ID NO 237 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 74

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 237 ttttgttctica tdgcaacct t c cctdgc.cag attcc togcct gtc.t.cccagc SO

<210s, SEQ ID NO 238 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 238 tgggtcacct Ctgggct tcc gcago accct C caaggcgga gtggagcctt SO

<210s, SEQ ID NO 239 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 239 gactgtgaaa ccgtcagttc ggaaggctgg ttagaac atg tdggagcaac SO

<210 SEQ ID NO 240 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 240

Ccagcc actic tacticaaggg gcatatattt to atgagg togatagag SO

<210s, SEQ ID NO 241 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 241 ggttggaaat accatcagcc titcCttgctic ggccCagg to ttitt Caggc SO

<210s, SEQ ID NO 242 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 242 cgaaaagaga aagtgggaaa atgggaagtic cct ctgccta aagtacgtgc SO

<210s, SEQ ID NO 243 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 75

- Continued

<4 OOs, SEQUENCE: 243 gcactitt cat acgcaggcat citcttgttac ctacatctaa gctgttc.ccg SO

<210s, SEQ ID NO 244 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 244 cct cqtgttg attgcaggag gag toggaat taaccotctg ctitt.ccatcc SO

<210s, SEQ ID NO 245 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 245 ggggct acat ttgttcattt C cagcagtag cataaactta C9gtgacatg SO

<210s, SEQ ID NO 246 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 246 t caggtgc cc titatgaaaag gcttgataga gggagtttgt cct gtggcc.c SO

<210s, SEQ ID NO 247 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 247 gtagattic cc aagagactitt agcagt cacc agcct taatg catgtacagg SO

<210s, SEQ ID NO 248 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 248 ggat attgtc. agt ctittagg ggttgggctg gatgc.cgagg taaaagttct SO

<210s, SEQ ID NO 249 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 249 catcc.gc.cgc ggcct cacgt gcgttgtaac aagcc ct cat Cacatgtgtg SO US 2015/03 15643 A1 Nov. 5, 2015 76

- Continued

<210s, SEQ ID NO 250 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 250 tgtgct coag gcgacaccat ttgc.catcct gcttctaacg caaac ccct g SO

<210s, SEQ ID NO 251 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 251

Cagggaga cc gtgtcagtag ggatgtgtgc ctggctgttgt acgtgggtgt SO

<210s, SEQ ID NO 252 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 252 ccagaggagg to Ctacac at taaaggataa agcc.ccc.cag tatgctggc SO

<210s, SEQ ID NO 253 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 253 agcactgcag act cagctga gcaccgatac aaagaaagac aaa catcctg SO

<210s, SEQ ID NO 254 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 254 aggaaatttg ggcaggaaag ggaact caca gtgtcggaat gcctggagca SO

<210s, SEQ ID NO 255 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 255 ggggggattg cct ctact catcactacctg tttctgcctg ttctgctgcc SO

<210s, SEQ ID NO 256 &211s LENGTH: 50 &212s. TYPE: DNA US 2015/03 15643 A1 Nov. 5, 2015 77

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 256 ggttcc to cq atc.ttacagg ct catccagg titccaaagtg cittctgtctic SO

<210s, SEQ ID NO 257 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OO > SEQUENCE: 257 gcatgtgt at gatgtgttgttg C9tcggaccg Cttct aggct act aagtgtc. SO

<210s, SEQ ID NO 258 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 258 agcc tatgga gtggc.ccgta aat cagttga Ctgtgtagct Cttgcctggc SO

<210 SEQ ID NO 259 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 259 tgtt atgaca C caagtgact acaagggagg caaga cc cct C caggcct CC SO

<210s, SEQ ID NO 260 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 260 caggact tcc ticacaaag.ca toacgtcatc ttgacagcta tdaattic citt SO

<210s, SEQ ID NO 261 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 261 gagtgaggtg gtaatcagtg cct Cttggitt to agc.cgtgg toctagtggc SO

<210s, SEQ ID NO 262 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer US 2015/03 15643 A1 Nov. 5, 2015 78

- Continued

<4 OOs, SEQUENCE: 262 citctgttgttt cotgttt cac cqccaccott toaggagaga act acaccag SO

<210s, SEQ ID NO 263 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 263 ccggtgatgg acccaccatc aactgcatcc ticcaa.gctgg attcc tagac SO

<210s, SEQ ID NO 264 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 264 cCaagcct aa ggtgaaaggg C9ggcagaac Ctgtggtgtt aggaggcaac SO

<210s, SEQ ID NO 265 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 265 gatt caccct gtccaaactg. cctaagcc ct c cqccattct caag.ccctgc SO

<210s, SEQ ID NO 266 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 266 tgcatggcaa atcctgtcgg tot coagttg gttatctgaa tagtgtcacc SO

<210s, SEQ ID NO 267 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer <4 OOs, SEQUENCE: 267 gagaatgctg tdgtcc.gcaa Cacccagatc gacaact cot ggggg.tctga SO

<210s, SEQ ID NO 268 &211s LENGTH: 50 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer

<4 OOs, SEQUENCE: 268 tccagctgct ggaatcct ac catcc cagga ggcaggcaca gcc agggaga SO