DATA AVAILABLE FOR BOLT PERFORMERS

Data Created During BOLT; corpora automatically distributed to performers. Contact [email protected] to obtain any missing corpora

Source Language cmn data Where arz data volume Relevant/Known volume ( unless eng data (arz = Egyptian (words otherwise volume Arabic; cmn = unless specified; 1 (words unless Release Mandarin Chinese; Genre Where otherwise char = 1.5 otherwise Catalog ID Title Date Type Description eng = English) Relevant/Known specified) words) specified) Discussion forums sample for eliciation of feedback on format, LDC2011E115 BOLT ‐ Sample Discussion Forums 12/22/2011 Source structure, etc. arz, cmn, eng discussion forum 67815 106130 142491 BOLT ‐ Phase 1 Discussion Forums Source Data R1 LDC2012E04 V2 3/29/2012 Source Discussion forums source data arz, cmn, eng discussion forum 33871338 36244922 29658002

LDC2012E16 BOLT ‐ Phase 1 Discussion Forums Source Data R2 3/22/2012 Source Discussion forum source data arz, cmn, eng discussion forum 118519987 264314806 273078669

LDC2012E21 BOLT ‐ Phase 1 Discussion Forums Source Data R3 4/24/2012 Source Discussion forums source data arz, cmn, eng discussion forum 127832646 279763913 282588862

LDC2012E54 BOLT ‐ Phase 1 Discussion Forums Source Data R4 5/31/2012 Source Discussion forums source data arz, cmn, eng discussion forum 368199350 838056761 676989452 List of threads rejected during triage LDC2012E62 BOLT ‐ Phase 1 Rejected Training Data Thread IDs 6/1/2012 Source for BOLT translation training data n/a discussion forum n/a n/a n/a List of source documents for IR LDC2012E82 BOLT Phase 1 IR Eval Source Data Document List 6/29/2012 Source evaluation arz, cmn, eng discussion forum 400036669 400168661 400219116 BOLT Phase 2 IR Source Data Document List and Discussion forum source documents LDC2013E08 Sample Query 1/31/2012 Source for support of P2 IR arz discussion forum 616719471 n/a n/a BOLT Phase 2 SMS and Chat Sample Source Data SMS/chat sample for eliciation of LDC2013E10 V1.1 3/5/2013 Source feedback on format, structure, etc. arz, cmn, eng SMS, chat 879 8424 6709 LDC2013E123 BOLT Phase 2 SMS/Chat Source Data R4 11/15/2013 Source SMS/chat source data arz SMS, chat 213516 n/a n/a LDC2013E49 BOLT Phase 2 SMS/Chat Source Data R1 V2 6/4/2013 Source SMS/chat source data arz, cmn SMS, chat 1958 10029 n/a LDC2013E63 BOLT Phase 2 SMS/Chat Source Data R2 V2 7/12/2013 Source SMS/chat source data arz, cmn SMS, chat 3829 280771 n/a LDC2013E84 BOLT Phase 2 SMS/Chat Source Data R3 9/25/2013 Source SMS/chat source data arz, cmn SMS, chat 95821 1585304 n/a Translation training data sample LDC2012E11 BOLT ‐ Phase 1 Translation Samples V2 3/6/2012 Translation release for BOLT P1 arz, cmn discussion forum 7 docs 17 docs n/a incremental training data LDC2012E124 BOLT Phase 1 Translation Training Data R6 10/17/2012 Translation release arz, cmn discussion forum 320887 459588 chars n/a incremental parallel text training data LDC2012E15 BOLT Phase 1 Translation Training Data R1 4/19/2012 Translation release arz, cmn discussion forum 90581 300257 chars n/a BOLT Phase 1 HTER Experiment Source and Source and translation files for BOLT LDC2012E18 Reference Translation 3/27/2012 Translation P1 HTER experiment arz, cmn discussion forum 4792 9789 chars n/a incremental parallel text training data LDC2012E19 BOLT Phase 1 Translation Training Data R2 4/30/2012 Translation release arz, cmn discussion forum 116165 52088 chars n/a Source and translation files for BOLT LDC2012E30 BOLT Phase 1 DevTest Source and Translation V4 6/25/2012 Translation P1 Devtest arz, cmn discussion forum 60296 58929 n/a incremental parallel text training data LDC2012E55 BOLT Phase 1 Translation Training Data R3 5/31/2012 Translation release arz, cmn discussion forum 311487 134284 chars n/a incremental parallel text training data LDC2012E81 BOLT Phase 1 Translation Training Data R4 6/20/2012 Translation release arz, cmn discussion forum 116073 253504 chars n/a incremental parallel text training data LDC2012E96 BOLT Phase 1 Translation Training Data R5 8/3/2012 Translation release arz, cmn discussion forum 214406 447263 chars n/a incremental parallel text training data LDC2013E118 BOLT Phase 2 Translation Training Data R3 10/11/2013 Translation release arz,cmn SMS, chat 7928 200024 n/a BOLT Phase 2 SMS and Chat DevTest Gold LDC2013E119 Standard Translation 10/18/2013 Translation gold standard translation release cmn SMS, chat n/a 5000 n/a incremental parallel text training data LDC2013E125 BOLT Phase 2 Translation Training Data R4 11/27/2013 Translation release arz,cmn SMS, chat 39796 212386 n/a incremental parallel text training data LDC2013E132 BOLT Phase 2 Translation Training Data R5 12/20/2013 Translation release cmn SMS, chat n/a 200076 n/a BOLT Phase 2 Arabizi Transliteration Translation 4 English translations of Egyptian a LDC2013E135 Experiment 1/7/2014 Translation Arabic source file arz n/a n/a n/a n/a BOLT Phase 2 Discussion Forum DevTest Gold gold standard translation for DevTest LDC2013E59 Standard Translation 7/3/2013 Translation discussion forum data arz,cmn discussion forum 4942 5044 n/a LDC2013E80 BOLT Phase 2 Translation DevTest Data R1 8/9/2013 Translation translation files for DevTest data cmn SMS, chat n/a 11621 n/a incremental parallel text training data LDC2013E81 BOLT Phase 2 Translation Training Data R1 8/9/2013 Translation release cmn SMS, chat n/a 10260 n/a LDC2013E83 BOLT Phase 2 Translation DevTest Data R2 8/26/2013 Translation translation files for DevTest data cmn SMS, chat n/a 31592 n/a incremental parallel text training data LDC2013E85 BOLT Phase 2 Translation Training Data R2 9/13/2013 Translation release cmn SMS, chat n/a 187205 n/a BOLT Phase 2 Additional Discussion Forum translation for discussion forum LDC2013E92 Translation DevTest Data 8/23/2013 Translation DevTest data arz,cmn discussion forum 81,928 50168 chars n/a incremental parallel text training data conversational LDC2013E94 BOLT Phase 2 Arabic CTS Translation Data R1 V2 3/18/2014 Translation release arz telephone speech 122534 n/a n/a incremental parallel text training data conversational LDC2014E08 BOLT Phase 3 Translation Training Data R1 V2 2/20/2014 Translation release cmn telephone speech n/a 199993 n/a LDC2014E09 BOLT Phase 2 Translation DevTest Data R3 2/14/2014 Translation translation files for DevTest data arz SMS, chat 35937 n/a n/a incremental parallel text training data LDC2014E18 BOLT Phase 2 Translation Training Data R6 2/28/2014 Translation release arz SMS, chat 358102 n/a n/a BOLT Phase 3 Translation With Audio Experiment translation experiment with audio conversational LDC2014E19 V2 3/10/2014 Translation access arz, cmn telephone speech 3609 3718 n/a BOLT Phase 2 SMS and Chat DevTest Gold gold standard translation for LDC2014E25 Standard Translation R2 3/25/2014 Translation SMS/chat data arz SMS, chat 4987 n/a n/a BOLT Phase 2 Egyptian Arabic SMS and Chat transliteration of Arabic SMS/chat LDC2013E121 Transliterated Sample Conversations 10/30/2013 Transliteration data arz SMS, chat 40304 n/a n/a BOLT Phase 2 Egyptian Arabic SMS and Chat Transliterated Sample Conversations with Manual manually corrected translateration of LDC2013E131 Correction 12/3/2013 Transliteration SMS/chat data arz SMS, chat 3784 n/a n/a BOLT Phase 1 Chinese Parallel Alignment Word aligned Chinese‐English LDC2012E24 and Tagging Part 1 6/8/2012 Word Alignment discussion forum parallel text cmn discussion forum n/a 59579 n/a

BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2012E51 Alignment Part 1 V2 7/10/2012 Word Alignment discussion forum parallel text arz discussion forum 68762 n/a n/a BOLT Phase 1 Chinese Parallel Word Alignment Word aligned Chinese‐English LDC2012E72 and Tagging Part 2 7/10/2012 Word Alignment discussion forum parallel text cmn discussion forum n/a 101957 n/a

BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2012E94 Alignment DF Part 2 v2 8/7/2012 Word Alignment discussion forum parallel text arz discussion forum 49334 n/a n/a BOLT Phase 1 ‐ Chinese Parallel Word Alignment Word aligned Chinese discussion LDC2012E95 and Tagging Part 3 8/7/2012 Word Alignment forum data cmn discussion forum n/a 102167 n/a

BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2013E01 Alignment DF 1/31/2013 Word Alignment discussion forum parallel text arz discussion forum 38610 n/a n/a BOLT Phase 1 Chinese Parallel Word Alignment Word aligned Chinese‐English LDC2013E02 and Tagging DF Part 4 1/31/2013 Word Alignment discussion forum parallel text cmn discussion forum n/a 166388 n/a

BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2013E09 Alignment DF Part 4 2/28/2013 Word Alignment discussion forum parallel text arz discussion forum 54903 n/a n/a

BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2013E25 Alignment DF Part 5 3/28/2013 Word Alignment discussion forum parallel text arz discussion forum 98000 n/a n/a

BOLT Phase 1 Egyptian Arabic Parallel Word Word alignment Egyptian Arabic‐ LDC2013E31 Alignment DF Part 6 4/12/2013 Word Alignment English discussion forum parallel text arz discussion forum 61112 n/a n/a

BOLT Phase 1 Egyptian Arabic Parallel Word Word alignment Egyptian Arabic‐ LDC2013E43 Alignment DF Part 7 5/9/2013 Word Alignment English discussion forum parallel text arz discussion forum 65054 n/a n/a BOLT Phase 1 Chinese Parallel Word Alignment Word aligned Chinese‐English LDC2013E51 and Tagging DF Part 5 6/28/2013 Word Alignment discussion forum parallel text cmn discussion forum n/a 50933 n/a BOLT Phase 2 Chinese Parallel Word Alignment Word aligned Chinese‐English LDC2014E02 and Tagging SMS/Chat Part 1 2/3/2014 Word Alignment SMS/chat parallel text cmn SMS, chat n/a 64670 n/a BOLT Phase 2 Chinese Parallel Word Alignment Word aligned Chinese‐English LDC2014E21 and Tagging SMS/Chat Part 2 3/31/2014 Word Alignment SMS/chat parallel text cmn SMS, chat n/a 42851 n/a BOLT Phase 1 Egyptian Arabic DF Part 5 Egyptian Arabic Treebank on LDC2012E107 V2.0 10/31/2012 Treebank discussion forum data arz discussion forum 36419 n/a n/a ChineseTreebank on discussion forum LDC2012E109 BOLT Phase 1 Chinese Treebank DF Part 1 9/14/2012 Treebank data cmn discussion forum n/a 179371 n/a English Treebank on discussion forum LDC2012E114 BOLT Phase 1 English Treebank DF Part 3 V1.0 9/12/2012 Treebank data eng discussion forum n/a n/a 51012 BOLT Phase 1 Chinese Treebank DF Part 2 Version ChineseTreebank on discussion forum LDC2012E120 2.0 10/12/2012 Treebank data cmn discussion forum n/a 54480 n/a BOLT Phase 1 Egyptian Arabic Treebank DF Part 6 Egyptian Arabic Treebank on LDC2012E125 V2.0 1/15/2013 Treebank discussion forum data arz discussion forum 98921 n/a n/a ChineseTreebank on discussion forum LDC2012E130 BOLT Phase 1 Chinese Treebank DF Part 3 1/7/2013 Treebank data cmn discussion forum n/a 99394 n/a Arabic Treebank Part 20 V1.0 ‐ BOLT Pilot ARZ LDC2012E25 Email 7/30/2012 Treebank Egyptian Arabic Treebank on email arz email 3768 n/a n/a Egyptian Arabic Treebank on LDC2012E28 Arabic Treebank ARZ Part 1, V1.0 5/31/2012 Treebank discussion forum data arz discussion forum 36757 n/a n/a Egyptian Arabic Treebank on LDC2012E88 BOLT Phase 1 ‐ Arabic Treebank ARZ Part 2, V1.0 8/6/2012 Treebank discussion forum data arz discussion forum 29213 n/a n/a BOLT Phase 1 Egyptian Arabic Treebank DF Part 3 Egyptian Arabic Treebank on LDC2012E89 V2.0 9/12/2012 Treebank discussion forum data arz discussion forum 31511 n/a n/a English Treebank on discussion forum LDC2012E92 BOLT Phase 1 English Treebank DF Part 1 V1.0 7/25/2012 Treebank data eng discussion forum n/a n/a 44319 BOLT Phase 1 Egyptian Arabic Treebank DF Part 1 Egyptian Arabic Treebank on LDC2012E93 V2.0 8/13/2012 Treebank discussion forum data arz discussion forum 36757 n/a n/a BOLT Phase 1 ‐ English Treebank BOLT WB Part 2, English Treebank on discussion forum LDC2012E97 V 1.0 8/7/2012 Treebank data eng discussion forum n/a n/a 51603 BOLT Phase 1 Egyptian Arabic Treebank DF Part 2 Egyptian Arabic Treebank on LDC2012E98 V2.0 8/13/2012 Treebank discussion forum data arz discussion forum 29213 n/a n/a BOLT Phase 1 Egyptian Arabic Treebank DF Part 4 Egyptian Arabic Treebank on LDC2012E99 V2.0 8/14/2012 Treebank discussion forum data arz discussion forum 41461 n/a n/a BOLT Phase 1 Egyptian Arabic Treebank DF Part 7 Egyptian Arabic Treebank on LDC2013E12 V1.0 1/31/2013 Treebank discussion forum data arz discussion forum 61112 n/a n/a BOLT Phase 2 Egyptian Arabic Treebank SMS/Chat Egyptian Arabic Treebank on LDC2013E120 Part 1 V2.0 11/15/2013 Treebank SMS/chat data arz SMS, chat 7159 n/a n/a

LDC2013E127 BOLT Phase 2 English Treebank SMS/Chat Part 1 12/6/2013 Treebank English treebank on SMS/chat data eng SMS, chat n/a n/a 50292

LDC2013E128 BOLT Phase 2 Chinese Treebank SMS/Chat Part 1 11/29/2014 Treebank Chinese treebank on SMS/chat data cmn SMS, chat n/a 50014 n/a BOLT Phase 2 Egyptian Arabic Treebank SMS/Chat Egypitan Arabic Treebank on LDC2013E133 Part 2 V1.0 2/3/2014 Treebank SMS/chat data arz SMS, chat 41678 n/a n/a English Treebank on discussion forum LDC2013E17 BOLT Phase 1 English Treebank DF Part 4 V1.0 2/13/2013 Treebank data eng discussion forum n/a n/a 53528 BOLT Phase 1 Egyptian Arabic Treebank DF Part 8 Egyptian Arabic Treebank on LDC2013E21 V1.0 3/4/2013 Treebank discussion forum data arz discussion forum 65054 n/a n/a Chinese treebank on discussion forum LDC2013E32 BOLT Phase 1 Chinese Treebank DF Part 4 4/12/2013 Treebank data cmn discussion forum n/a n/a n/a English Treebank on discussion forum LDC2013E40 BOLT Phase 1 English Treebank DF Part 5 V1.0 3/28/2013 Treebank data eng discussion forum n/a n/a 68445 BOLT Phase 1 English Treebank DF Part 6 V1.1 ‐‐ English Treebank on discussion forum LDC2013E50 ECTB 6/4/2013 Treebank data eng discussion forum n/a n/a 52972 BOLT Phase 1 English Treebank DF Part 7 V1.0 ‐‐ English Treebank on discussion forum LDC2013E76 ECTB 8/5/2013 Treebank data eng discussion forum n/a n/a 93280 English Treebank on discussion forum LDC2014E03 BOLT Phase 2 English Treebank SMS/Chat Part 2 3/31/2014 Treebank data eng SMS, chat n/a n/a 50000 BOLT Phase 2 Egyptian Arabic Treebank SMS/Chat LDC2014E17 Part 3 V1.0 3/31/2014 Treebank Egyptian Treebank on SMS/chat data arz SMS, chat n/a n/a n/a LDC2014E23 BOLT Phase 2 Chinese Treebank SMS/Chat Part 2 3/31/2014 Treebank Chiniese Treebank on SMS/chat data cmn SMS, chat n/a 149,000 n/a Chinese PropBank on discussion forum LDC2012E121 BOLT Phase 1 Chinese Propbank DF Part 1 10/15/2012 Propbank data cmn discussion forum n/a 51412 n/a Egyptian Arabic PropBank on LDC2012E122 BOLT Phase 1 Egyptian Arabic Propbank DF Part 1 11/28/2012 Propbank discussion forum data arz discussion forum 36575 n/a n/a English PropBank on discussion forum LDC2012E123 BOLT Phase 1 English Propbank DF Part 1 11/28/2012 Propbank data eng discussion forum n/a n/a 44319 English PropBank on discussion forum LDC2012E128 BOLT Phase 1 English Propbank DF Part 2 11/28/2012 Propbank data eng discussion forum n/a n/a 51603 Egyptian Arabic PropBank on LDC2012E129 BOLT Phase 1 Egyptian Arabic Propbank DF Part 2 11/28/2012 Propbank discussion forum data arz discussion forum 29213 n/a n/a Chinese PropBank on discussion forum LDC2012E131 BOLT Phase 1 Chinese Propbank DF Part 2 1/9/2013 Propbank data cmn discussion forum n/a 126461 n/a English PropBank on discussion forum LDC2013E05 BOLT Phase 1 English Propbank DF Part 3 1/24/2013 Propbank data eng discussion forum n/a n/a 51012 English PropBank on discussion forum LDC2013E102 BOLT Phase 1 English Propbank DF Part 6 V2.0 10/15/2013 Propbank data eng discussion forum n/a n/a n/a Egyptian Arabic PropBank on LDC2013E108 BOLT Phase 1 Egyptian Arabic Propbank DF Part 8 11/15/2013 Propbank discussion forum data arz discussion forum 65054 n/a n/a English PropBank on discussion forum LDC2013E129 BOLT Phase 1 English Propbank DF Part 7 11/29/2013 Propbank data eng discussion forum n/a n/a n/a Egyptian Arabic PropBank on LDC2013E22 BOLT Phase 1 Egyptian Arabic Propbank DF Part 3 3/19/2013 Propbank discussion forum data arz discussion forum 31000 n/a n/a Egyptian Arabic PropBank on LDC2013E23 BOLT Phase 1 Egyptian Arabic Propbank DF Part 4 3/19/2013 Propbank discussion forum data arz discussion forum 41000 n/a n/a Egyptian Arabic PropBank on LDC2013E24 BOLT Phase 1 Egyptian Arabic Propbank DF Part 5 3/19/2013 Propbank discussion forum data arz discussion forum 36000 n/a n/a Chinese PropBank on discussion forum LDC2013E33 BOLT Phase 1 Chinese Propbank DF Part 3 4/24/2013 Propbank data cmn discussion forum n/a 153874 n/a English PropBank on discussion forum LDC2013E34 BOLT Phase 1 English Propbank DF Part 4 V3.0 9/30/2013 Propbank data eng discussion forum n/a n/a 53528 BOLT Phase 1 Egyptian Arabic Propbank DF Part 6 Egyptian Arabic PropBank on LDC2013E72 V2 11/15/2013 Propbank discussion forum data arz discussion forum 98921 n/a n/a BOLT Phase 1 Egyptian Arabic Propbank DF Part 7 Egyptian Arabic PropBank on LDC2013E73 V2 11/15/2013 Propbank discussion forum data arz discussion forum 61112 n/a n/a English PropBank on discussion forum LDC2013E74 BOLT Phase 1 English Propbank DF Part 5 V2.0 10/15/2013 Propbank data eng discussion forum n/a n/a 68445 Chinese PropBank on discussion forum LDC2013E75 BOLT Phase 1 Chinese Propbank DF Part 4 8/5/2013 Propbank data cmn discussion forum n/a 87698 n/a Chinese PropBank on discussion forum LDC2014E04 BOLT Phase 2 Chinese Propbank SMS/Chat Part 1 2/11/2014 Propbank data cmn SMS, chat n/a 50046 n/a English PropBank on discussion forum LDC2014E22 BOLT Phase 2 English Propbank SMS/Chat Part 1 3/31/2014 Propbank data eng SMS, chat n/a n/a 50292 BOLT Phase 1 Egyptian Arabic Co‐reference DF Egyptian Arabic co‐reference on LDC2013E103 Part 4 10/15/2013 Co‐reference discussion forum data arz discussion forum 41461 n/a n/a BOLT Phase 1 Egyptian Arabic Co‐reference DF Egyptian Arabic co‐reference on LDC2013E104 Part 5 10/15/2013 Co‐reference discussion forum data arz discussion forum 36419 n/a n/a English co‐refernece on discussion LDC2013E105 BOLT Phase 1 English Co‐reference DF Part 5 10/15/2013 Co‐reference forum data eng discussion forum n/a n/a 68445 English co‐refernece on discussion LDC2013E106 BOLT Phase 1 English Co‐reference DF Part 6 10/15/2013 Co‐reference forum data eng discussion forum n/a n/a 52972 Chinese co‐reference on discussion LDC2013E107 BOLT Phase 1 Chinese Co‐reference DF Part 2 10/15/2013 Co‐reference forum data cmn discussion forum n/a 54480 n/a BOLT Phase 1 Egyptian Arabic Co‐reference DF Egyptian Arabic co‐reference on LDC2013E35 Part 1 V2.0 5/9/2013 Co‐reference discussion forum data arz discussion forum 36757 n/a n/a English co‐refernece on discussion LDC2013E36 BOLT Phase 1 English Co‐reference DF Part 1 4/24/2013 Co‐reference forum data eng discussion forum n/a n/a 44319 English co‐refernece on discussion LDC2013E37 BOLT Phase 1 English Co‐reference DF Part 2 4/24/2013 Co‐reference forum data eng discussion forum n/a n/a 51603 BOLT Phase 1 Chinese Co‐reference DF Part 1 Chinese co‐reference on discussion LDC2013E38 V3.0 4/24/2013 Co‐reference forum data cmn discussion forum n/a 40000 n/a BOLT Phase 1 Egyptian Arabic Co‐reference DF Egyptian Arabic co‐reference on LDC2013E68 Part 2 8/5/2013 Co‐reference discussion forum data arz discussion forum 29213 n/a n/a BOLT Phase 1 Egyptian Arabic Co‐reference DF Egyptian Arabic co‐reference on LDC2013E69 Part 3 8/5/2013 Co‐reference discussion forum data arz discussion forum 31511 n/a n/a English co‐refernece on discussion LDC2013E70 BOLT Phase 1 English Co‐reference DF Part 3 7/15/2013 Co‐reference forum data eng discussion forum n/a n/a 51012 English co‐refernece on discussion LDC2013E71 BOLT Phase 1 English Co‐reference DF Part 4 7/15/2013 Co‐reference forum data eng discussion forum n/a n/a 53528

Assessment results for pooled system responses to BOLT P1 IR eval; first 50 LDC2012E118 BOLT Phase 1 IR Eval Assessment Results V1.1 10/17/2012 IR bullets pooled from each run per topic arz, cmn, eng eval relevance assessment judgments for 32049 citations (system responses) to 100 BOLT Phase 2 Information BOLT Phase 2 IR Eval Assessment Results ‐ Retrieval queries LDC2013E124 Relevance V2.0 11/22/2013 IR arz, cmn, eng

BOLT Phase 2 IR Relevance Eval Assessment assessment results for system LDC2013E134 Results ‐ Output from NIST 12/20/2013 IR responese to IR evaluation relevance arz, cmn, eng 100 evaluation queries for the BOLT IR LDC2013E136 BOLT Phase 2 IR Evaluation Queries 1/7/2014 IR evaluation arz, cmn, eng assessment results for the 200 system responses to the BOLT Phase 2 Information Retrieval (IR) pilot query, LDC2013E20 BOLT Phase 2 IR Pilot Assessment Results 5/6/2013 IR distributed in LDC2013E08 50 dry run queries for the BOLT IR LDC2013E46 BOLT Phase 2 IR Dry Run Queries 5/20/2013 IR Phase 2 arz, cmn, eng relevance assessment results for system responses to the BOLT Phase 2 LDC2013E67 BOLT Phase 2 IR Dry Run Assessment Results V1.1 8/5/2013 IR IR dry run queries arz, cmn, eng redundancy assessment results for 229 views and 1295 groups of 4408 BOLT Phase 2 IR Eval Assessment Results ‐ relevant, English citations to 61 IR LDC2014E01 Redundancy 1/9/2014 IR queries eng

Data Donated by Performers to BOLT; Contact [email protected] to obtain any missing corpora

English Translation Treebank, EATB Part 16 V1.0 Performer‐ LDC2012E12 (POS) ‐ GALE Phase 5 BC n/a donated data Performer‐ LDC2012E13 Arabic Treebank Part 17 V1.0 ‐ GALE Phase 5 WB n/a donated data BBN/LDC/Sakhr Arabic‐Dialect/English Parallel Performer‐ LDC2012E17 Corpus 3/23/2012 donated data Performer‐ LDC2012E28 Arabic Treebank ARZ Part 1, V1.0 6/6/2012 donated data Performer‐ LDC2012E56 AIDA: Automatic Identification of Dialectal Arabic 5/23/2012 donated data CALIMA: Columbia Arabic Language Performer‐ LDC2012E57 Morphological Analyzer ‐‐ Egyptian Arabic 5/23/2012 donated data CODAFY 0.1: Automatic mapper into the Performer‐ LDC2012E58 Conventional Orthography of Dialectal Arabic 5/23/2012 donated data ELISSA: Dialectal Arabic to Modern Standard Performer‐ LDC2012E59 Arabic Translation System 5/23/2012 donated data MADA‐ARZ Morphological Analysis and Performer‐ LDC2012E60 Disambiguation for Arabic (Egyptian version) 5/23/2012 donated data BBN/LDC WebForum Selections Arabic/English Performer‐ LDC2012E75 Parallel Corpus 6/12/2012 donated data BBN/LDC WebForum Selections Chinese/English Performer‐ LDC2012E76 Parallel Corpus n/a donated data Performer‐ LDC2012E77 The 2012 IBM Egyptian Arabic Corpus n/a donated data Chinese idiom translation dictionary + word Performer‐ LDC2012E78 segmenter dictionary ‐ web resources n/a donated data bilingual data extracted from three Creative Performer‐ LDC2012E79 Commons (CC BY‐SA) sources n/a donated data 8 years worth of summary/article sets collected Performer‐ LDC2012E80 via Newsblaster n/a donated data

Data Created during BOLT; Corpora released for data team internal use

incremental release of Chinese LDC/NIST or data discussion forum source data with SU team internal annotations, for use in the LDC2012R16 BOLT Phase 1 ‐ Chinese Source Data R1 2/22/2012 release downstream annotation pipeline LDC/NIST or data source files and corresponding BOLT Phase 1 HTER Experiment Source and team internal translation files for the BOLT Phase 1 LDC2012R18 Reference Translation 3/26/2012 release HTER experiment. arz, cmn discussion forum 4792 9789 n/a source data selected and sentence‐ LDC/NIST or data segmented (SU) data for use in BOLT ‐ Selected & Segmented Source Data for team internal downstream BOLT translation and LDC2012R23 Annotation R1 3/19/2012 release annotation tasks. arz, cmn LDC/NIST or data team internal LDC2012R27 BOLT Phase 1 Dev Test Source Superset n/a release Dev Test superset data LDC/NIST or data source files for BOLT Phase 1 R2 Eval team internal and DevTest Source Superset for LDC2012R28 BOLT Phase 1 R2 Eval Source Superset 4/23/2012 release Arabic and Chinese arz, cmn LDC/NIST or data post‐editing BOLT Phase 1 HTER Experiment MT Post‐editing team internal results for the BOLT Phase 1 HTER LDC2012R33 Results 4/25/2012 release experiment LDC/NIST or data team internal Phase 1 dry run information retrieval LDC2012R34 BOLT Phase 1 IR Dry Run Queries 5/2/2012 release queries arz,cmn,eng discussion forum increment of source data selected and LDC/NIST or data sentence‐segmented (SU) data for use BOLT ‐ Selected & Segmented Source Data for team internal in downstream BOLT translation and LDC2012R36 Annotation R2 4/30/2012 release annotation tasks arz,cmn LDC/NIST or data the Arabic and Chinese eval selection team internal from the R3 sequestered collection for LDC2012R37 BOLT Phase 1 R3 Eval Source Superset n/a release NIST to downsample arz,cmn LDC/NIST or data Source and initial translation files to BOLT Phase 1 Eval Superset Source and First Pass team internal enable selection of Phase 1 evaluation LDC2012R38 Translation 9/10/2013 release set arz, cmn discussion forum 199150 299622 chars source data selected and sentence‐ LDC/NIST or data segmented (SU) data for use in BOLT ‐ Selected & Segmented Source Data for team internal downstream BOLT translation and LDC2012R49 Annotation R3 5/22/2012 release annotation tasks arz,cmn,eng LDC/NIST or data Translation gold standard reference BOLT Phase 1 Eval Final Set Source and team internal for Arabic and Chinese for P1 HTER LDC2012R52 Translation 7/17/2012 release evaluation arz, cmn discussion forum 21720 35944 chars LDC/NIST or data team internal queries for the BOLT Phase 1 IR LDC2012R53 BOLT Phase 1 IR Eval Queries 7/6/2012 release evaluation. arz,cmn,eng discussion forum LDC/NIST or data assessment annotations for team team internal responses to the BOLT phase 1 IR dry LDC2012R54 BOLT Phase 1 IR Dry Run Assessments 6/19/2012 release run queries arz,cmn LDC/NIST or data machine translation post‐editing team internal results for the BOLT Phase 1 MT LDC2012R65 BOLT Phase 1 MT Evaluation Post Editing Results 8/30/2012 release Evaluation Post Editing effort assessment results for system LDC/NIST or data responses to the BOLT Information team internal Retrieval (IR) Phase 1 Evaluation LDC2012R71 BOLT Phase 1 IR Assessment Results 10/17/2012 release Queries (LDC2012R53) LDC/NIST or data team internal Post editing results for 5k word LDC2012R72 BOLT Phase 1 MT DevTest Post Editing Results 9/14/2012 release portion P1 DevTest data LDC/NIST or data incremental release source data with BOLT ‐ Selected & Segmented Source Data for team internal SU annotations for use in the LDC2012R77 Annotation R4 9/20/2012 release downstream annotation pipeline LDC/NIST or data MTPE results for additional site, team internal subset of original phase 1 evaluation LDC2013R11 BOLT Phase 1 Additional MT Post Editing Results n/a release set LDC/NIST or data source files for BOLT Phase 1 R2 Eval team internal and DevTest Source Superset for LDC2013R12 BOLT Phase 2 Eval and DevTest R1 Superset 6/17/2013 release Arabic and Chinese cmn SMS n/a 20246 n/a LDC/NIST or data source fil es for BOLT Phase 2 Eval and team internal DevTest Superset for Chinese for LDC2013R15 BOLT Phase 2 Eval and DevTest R2 Superset 7/18/2013 release downsampling purposes. cmn conversation n/a 103359 n/a LDC/NIST or data English and Chinese SUed SMS/Chat BOLT Phase 2 SMS/Chat SUed Source Data for team internal source data selected for the use of LDC2013R21 Annotation n/a release downstream annotation. cmn sms, chat n/a 200067 n/a LDC/NIST or data team internal First increment of Chinese Translation LDC2013R22 BOLT Phase 2 Translation Eval Data R1 n/a release Eval data LDC/NIST or data 100 evaluation queries for the BOLT IR team internal Phase 2 task: 34 English, 33 Arabic, LDC2013R34 BOLT Phase 2 IR Eval Queries 9/15/2013 release and 33 Chinese queries arz,cmn,eng source files and corresponding gold LDC/NIST or data standard translation files for BOLT BOLT Phase 2 Eval Chinese Final Set Source and team internal Phase 3 MT Evaluation HTER post‐ LDC2013R39 Translation 9/30/2013 release editing cmn SMS,Chat n/a 19990 n/a

LDC/NIST or data training Arabic conversations in su.xml BOLT SMS/Chat Egyptian Arabic Training team internal format for running conversations from LDC2013R40 Conversations for Transliteration n/a release Arzbizi to Arabic orthograhpy. arz SMS,Chat LDC/NIST or data BOLT ‐ Selected & Segmented SMS/Chat Source team internal Source files of sentence segmented LDC2013R42 Data for Annotation R2 11/4/2013 release for Annotation. cmn SMS,Chat a sample of LDCs eval relevance assessment judgments for 440 LDC/NIST or data citations/system responses to 5 BOLT BOLT Phase 2 IR Eval Assessment Results ‐ team internal Phase 2 Information Retrieval (IR) eval LDC2013R44 Relevance SAMPLE 11/15/2013 release queries arz,cmn,eng LDC/NIST or data machine translation post‐editing BOLT Phase 2 MT Evaluation Post Editing Results team internal results for the BOLT Phase 2 MT LDC2013R47 V2 12/13/2013 release Evaluation Post Editing effort

LDC/NIST or data source files for the team internal BOLT Phase 2 Eval and DevTest LDC2013R48 BOLT Phase 2 Eval and DevTest R3 Superset 11/22/2013 release Superset for downsampling purposes arz SMS, Chat 159287 n/a n/a source files sentence segmented for LDC/NIST or data the third increment BOLT Phase 2 BOLT Selected and Segmented SMS and Chat team internal SMS/Chat SU Source Data for LDC2013R50 Source Data for Annotation R3 12/13/2013 release Annotation arz,eng SMS,Chat 50451 n/a 31477 LDC/NIST or data machine translation post‐editing team internal results for the BOLT Phase 2 MT LDC2013R53 BOLT Phase 2 MT DevTest Post Editing Results 12/20/2023 release DevTest Post Editing effort LDC/NIST or data conversations from the Egyptian BOLT Phase 2 Arabic SMS/Chat Training team internal Arabic SMS/Chat collection for LDC2014R01 Conversations for Transliteration 10/31/2012 release conversion into Arabic orthography arz SMS,Chat LDC/NIST or data BOLT ‐ Selected & Segmented SMS/Chat Source team internal source files sentence‐segmented for LDC2014R02 Data for Annotation R4 1/24/2014 release annotation cmn SMS/Chat LDC/NIST or data source files and corresponding first‐ team internal pass translation files for BOLT Phase 2 LDC2014R03 BOLT Phase 2 Translation Eval Data R2 2/14/2014 release Translation Eval Data arz SMS/Chat 26185 n/a n/a the source conversations for LDC/NIST or data conversion from Romanized to Arabic BOLT Phase2 Arabic SMS and Chat Training team internal orthography, using software known as LDC2014R07 Conversations for Transliteration R2 3/4/2014 release 3ARRIB. arz SMS/Chat LDC/NIST or data BOLT Phase 2 Eval Arabic Final Set Source and team internal source files and corresponding gold LDC2014R09 Gold Standard Translation 3/25/2014 release standard translation files arz SMS/Chat 19696 n/a n/a LDC/NIST or data Arabizi conversations with source, BOLT Phase 2 SMS and Chat Arabic Training Data ‐‐ team internal annotation, original transliteration, LDC2014R12 Source Annotation,Transliteration and Translation n/a release corrected transliteration data arz SMS/Chat

Data Created Prior to BOLT; Covered on BOLT evaluation license and available by request from LDC Membership Office ([email protected])

LDC93T1 ACL/DCI LDC Publication LDC93T3A TIPSTER Complete LDC Publication LDC94T4B‐1 UN Parallel Text (English) LDC Publication LDC95T11 European Language Newspaper Text LDC Publication LDC95T13 Mandarin Chinese News Text LDC Publication LDC95T21 North American News LDC Publication LDC96L14 CELEX2 LDC Publication LDC96L15 CALLHOME Mandarin Chinese Lexicon LDC Publication LDC96S34 CALLHOME Mandarin Chinese Speech LDC Publication CALLFRIEND American English‐Non‐Southern LDC96S46 Dialect LDC Publication LDC96S47 CALLFRIEND American English‐Southern Dialect LDC Publication LDC96S49 CALLFRIEND Egyptian Arabic LDC Publication

LDC96S55 CALLFRIEND Mandarin Chinese‐Mainland Dialect LDC Publication LDC96S56 CALLFRIEND Mandarin Chinese‐Taiwan Dialect LDC Publication LDC96T16 CALLHOME Mandarin Chinese Transcripts LDC Publication LDC97L19 CALLHOME Egyptian Arabic Lexicon LDC Publication

LDC97L20 CALLHOME American English Lexicon (PRONLEX) LDC Publication LDC97S42 CALLHOME American English Speech LDC Publication LDC97S44 1996 English Broadcast News Speech (Hub‐4) LDC Publication LDC97S45 CALLHOME Egyptian Arabic Speech LDC Publication LDC97S62 SWITCHBOARD‐1 Release 2 LDC Publication 1996 English Broadcast News Dev and Eval (Hub‐ LDC97S66 4) LDC Publication LDC97T14 CALLHOME American English Transcripts LDC Publication LDC97T19 CALLHOME Egyptian Arabic Transcripts LDC Publication

LDC97T22 1996 English Broadcast News Transcripts (Hub‐4) LDC Publication LDC98S69 Hub‐5 Mandarin Telephone Speech Corpus LDC Publication LDC98S71 1997 English Broadcast News Speech (Hub‐4) LDC Publication LDC98S72 Taiwanese Putonghua Speech and Transcripts LDC Publication 1997 Mandarin Broadcast News Speech (Hub‐ LDC98S73 4NE) LDC Publication LDC98S75 Switchboard‐2 Phase 1 LDC Publication 1997 Mandarin Broadcast News Transcripts (Hub‐ LDC98T24 4NE) LDC Publication LDC98T25 TDT Pilot Study Corpus LDC Publication LDC98T26 Hub‐5 Mandarin Transcripts LDC Publication

LDC98T28 1997 English Broadcast News Transcripts (Hub‐4) LDC Publication LDC98T30 North American News Text Supplement LDC Publication LDC99L22 Egyptian Colloquial Arabic Lexicon LDC Publication LDC99L23 American English Spoken Lexicon LDC Publication LDC99S79 Switchboard‐2 Phase II LDC Publication LDC99S84 TDT2 English Audio LDC Publication LDC99T37 TDT2 English Text, Version 2 LDC Publication LDC99T38 TDT2 Mandarin Text LDC Publication LDC99T42 Treebank‐3 LDC Publication LDC2000S92 TDT2 Careful Transcription Audio LDC Publication LDC2000T43 BLLIP 1987‐89 WSJ Corpus Release 1 LDC Publication LDC2000T44 TDT2 Careful Transcription Text LDC Publication LDC2000T46 Hong Kong News Parallel Text LDC Publication LDC2000T47 Hong Kong Laws Parallel Text LDC Publication LDC2000T50 Hong Kong Hansards Parallel Text LDC Publication LDC2000T52 TREC Mandarin LDC Publication LDC2001S13 Switchboard Cellular Part 1 Audio LDC Publication LDC2001S15 Switchboard Cellular Part 1 Transcribed Audio LDC Publication 1997 HUB‐4 Broadcast News Evaluation Non LDC2001S91 English Test Material LDC Publication LDC2001S93 TDT2 Mandarin Audio Corpus LDC Publication LDC2001S94 TDT3 English Audio LDC Publication LDC2001S95 TDT3 Mandarin Audio LDC Publication LDC2001T02 Message Understanding Conference (MUC) 7 LDC Publication LDC2001T11 Chinese Treebank Version 2.0 LDC Publication LDC2001T14 Switchboard Cellular Part 1 Transcription LDC Publication LDC2001T55 Arabic Newswire Part 1 LDC Publication LDC2001T57 TDT2 Multilanguage Text Version 4.0 LDC Publication LDC2001T58 TDT3 Multilanguage Text Version 2.0 LDC Publication

LDC2002L27 Chinese‐English Translation Lexicon Version 3.0 LDC Publication Buckwalter Arabic Morphological Analyzer LDC2002L49 Version 1.0 LDC Publication LDC2002S06 Switchboard‐2 Phase III Audio LDC Publication LDC2002S09 2000 Hub5 English Evaluation Speech LDC Publication LDC2002S10 1998 HUB5 English Evaluation LDC Publication 1997 HUB4 English Evaluation Speech and LDC2002S11 Transcripts LDC Publication LDC2002S12 2001 HUB5 Mandarin Evaluation LDC Publication LDC2002S13 2001 HUB5 English Evaluation LDC Publication LDC2002S22 1997 HUB5 Arabic Evaluation LDC Publication 1997 HUB5 English Evaluation (includes LDC2002S23 transcripts) LDC Publication LDC2002S37 Callhome Egyptian Arabic Speech Supplement LDC Publication LDC2002T01 Multiple‐Translation Chinese Corpus LDC Publication

LDC2002T38 Callhome Egyptian Arabic Transcripts Supplement LDC Publication LDC2002T43 2000 HUB5 English Evaluation Transcripts LDC Publication LDC2002T39 1997 HUB5 Arabic Transcripts LDC Publication LDC2003T01 2001 HUB5 Mandarin Transcripts LDC Publication LDC2003T02 1998 HUB5 English Transcripts LDC Publication LDC2003T05 English Gigaword LDC Publication LDC2003T06 Arabic Treebank: Part 1 v 2.0 LDC Publication Arabic Treebank: Part 1 ‐ 10K‐word English LDC2003T07 Translation LDC Publication LDC2003T09 Chinese Gigaword LDC Publication LDC2003T11 ACE‐2 Version 1.0 LDC Publication LDC2003T12 Arabic Gigaword LDC Publication LDC2003T17 Multiple‐Translation Chinese (MTC) Part 2 LDC Publication LDC2003T18 Multiple‐Translation Arabic (MTA) Part 1 LDC Publication Buckwalter Arabic Morphological Analyzer LDC2004L02 Version 2.0 LDC Publication LDC2004S07 Switchboard Cellular Part 2 Audio LDC Publication LDC2004S08 MDE RT‐03 Training Data Speech LDC Publication Santa Barbara Corpus of Spoken American English LDC2004S10 III LDC Publication 2002 Rich Transcription Broadcast News and LDC2004S11 Conversational Telephone Speech LDC Publication LDC2004S13 Fisher English Training Speech Part 1 Speech LDC Publication LDC2004T02 Arabic Treebank: Part 2 v 2.0 LDC Publication LDC2004T05 Chinese Treebank Version 4.0 LDC Publication LDC2004T07 Multiple‐Translation Chinese (MTC) Part 3 LDC Publication LDC2004T08 Hong Kong Parallel Text LDC Publication TIDES Extraction (ACE) 2003 Multilingual Training LDC2004T09 Data LDC Publication LDC2004T11 Arabic Treebank: Part 3 v 1.0 LDC Publication LDC2004T12 MDE RT‐03 Training Data Text and Annotations LDC Publication LDC2004T14 Proposition Bank I LDC Publication LDC2004T17 Arabic News Translation Text Part 1 LDC Publication LDC2004T18 Arabic English Parallel News Part 1 LDC Publication

LDC2004T19 Fisher English Training Speech Part 1, Transcripts LDC Publication LDC2004T23 Prague Arabic Dependency Treebank 1.0 LDC Publication LDC2005S07 Levantine Arabic QT Training Data Set 3 Speech LDC Publication

LDC2005S11 TDT4 Multilingual Broadcast News Speech Corpus LDC Publication LDC2005S13 Fisher English Training Part 2, Speech LDC Publication Levantine Arabic QT Training Data Set 4 (Speech + LDC2005S14 Transcripts) LDC Publication LDC2005S15 HKUST Mandarin Telephone Speech, Part 1 LDC Publication LDC2005S16 MDE RT04 Training Data Speech LDC Publication Santa Barbara Corpus of Spoken American English LDC2005S25 Part‐IV LDC Publication LDC2005T01 Chinese Treebank 5.0 LDC Publication LDC2005T01U01Chinese Treebank 5.1 LDC Publication Arabic Treebank: Part 1 v 3.0 (POS with full LDC2005T02 vocal.+ syntactic analysis LDC Publication Levantine Arabic QT Training Data Set 3 LDC2005T03 Transcripts LDC Publication LDC2005T05 Multiple‐Translation Arabic (MTA) Part 2 LDC Publication LDC2005T06 Chinese News Translation Text Part 1 LDC Publication ACE Time Normalization (TERN) 2004 English LDC2005T07 Training Data V1.0 LDC Publication LDC2005T08 Discourse Graphbank LDC Publication LDC2005T09 ACE 2004 Multilingual Training Corpus LDC Publication LDC2005T10 Chinese English News Magazine Parallel Text LDC Publication LDC2005T12 English Gigaword Second Edition LDC Publication LDC2005T13 CCGbank LDC Publication LDC2005T14 Chinese Gigaword Second Edition LDC Publication LDC2005T16 TDT4 Multilingual Text and Annotations LDC Publication LDC2005T19 Fisher English Training Part 2, Transcripts LDC Publication Arabic Treebank: Part 3 (full corpus) v2.0 (MPG + LDC2005T20 Syntactic Analysis) LDC Publication LDC2005T23 Chinese Proposition Bank 1.0 LDC Publication LDC2005T24 MDE RT‐04 Training Data Text/Annotations LDC Publication HKUST Mandarin Telephone Transcript Data, Part LDC2005T32 1 LDC Publication

LDC2005T33 BBN Pronoun Coreference and Entity Type Corpus LDC Publication LDC2005T34 Chinese <‐> English Name Entity Lists (v1.0) LDC Publication ISI Arabic‐English Automatically Extracted Parallel LDC2007T08 Text LDC Publication ISI Chinese‐English Automatically Extracted LDC2007T09 Parallel Text LDC Publication LDC2007T20 GALE Phase 1 Distillation Training LDC Publication GALE Phase 1 Chinese Broadcast News Parallel LDC2007T23 Text ‐ Part 1 LDC Publication GALE Phase 1 Arabic Broadcast News Parallel Text LDC2007T24 ‐ Part 1 LDC Publication LDC2008T02 GALE Phase 1 Arabic Blog Parallel Text LDC Publication LDC2008T06 GALE Phase 1 Chinese Blog Parallel Text LDC Publication GALE Phase 1 Chinese Broadcast News Parallel LDC2008T08 Text ‐ Part 2 LDC Publication GALE Phase 1 Arabic Broadcast News Parallel Text LDC2008T09 ‐ Part 2 LDC Publication CALLHOME Mandarin Chinese Transcripts ‐ XML LDC2008T17 version LDC Publication GALE Phase 1 Chinese Broadcast News Parallel LDC2008T18 Text ‐ Part 3 LDC Publication GALE Phase 1 Chinese Broadcast Conversation LDC2009T02 Parallel Text ‐ Part 1 LDC Publication GALE Phase 1 Arabic Newsgroup Parallel Text ‐ LDC2009T03 Part 1 LDC Publication GALE Phase 1 Chinese Broadcast Conversation LDC2009T06 Parallel Text ‐ Part 2 LDC Publication GALE Phase 1 Arabic Newsgroup Parallel Text ‐ LDC2009T09 Part 2 LDC Publication GALE Phase 1 Chinese Newsgroup Parallel Text ‐ LDC2009T15 Part 1 LDC Publication GALE Phase 1 Chinese Newsgroup Parallel Text ‐ LDC2010T03 Part 2 LDC Publication 2008/2010 NIST Metrics for Machine Translation LDC2011T05 (MetricsMaTr) GALE Evaluation Set LDC Publication LDC2011T07 English Gigaword Corpus Fifth Edition LDC Publication LDC2011T11 Arabic Gigaword Corpus Fifth Edition LDC Publication LDC2011T13 Chinese Gigaword Corpus Fifth Edition LDC Publication RT‐04F STT Multilingual Speech Development LDC2004E10 Data ‐ Supplement EARS Ecorpus LDC2004E16 RT‐04 MDE DevTest Set #1 Version 1.2 EARS Ecorpus RT‐04F STT Multilingual Speech Development LDC2004E18 Data V1.1 Re‐release EARS Ecorpus RT‐04F STT Multilingual Transcripts Devlopment LDC2004E19 Data V1.2 EARS Ecorpus LDC2004E24 RT‐04 MDE Annotation Consistency Study EARS Ecorpus LDC2004E28 RT‐04 STT Transcription Consistency Study EARS Ecorpus LDC2004E29 RT‐04 MDE DevTest Set #2 V1.2 EARS Ecorpus LDC2004E33 EARS MDE Diarization Scoring Package EARS Ecorpus LDC2004E47 RT‐04 MDE Non‐English Pilot Corpus V1.0 EARS Ecorpus LDC2004E65 Levantine Arabic QT Training Data Set 2 Speech EARS Ecorpus Levantine Arabic QT Training Data Set 2 LDC2004E66 Transcripts V1.2 EARS Ecorpus RT‐04F STT Chinese CTS Development Data LDC2004E67 Speech EARS Ecorpus RT‐04F STT Chinese CTS Development Data LDC2004E68 Transcripts EARS Ecorpus Switchboard‐1 Quick Transcripts from LDC2005E26 BBN/WordWave EARS Ecorpus EARS RT04 Evaluation Transcripts and MDE LDC2005E73 Annotations EARS Ecorpus LDC2005E74 EARS RT04 Evaluation Audio EARS Ecorpus LDC2005E24 GALE MT Dry Run Data Set V1.0 GALE Ecorpus LDC2005E27 GALE MT DevTest Data V2.1 GALE Ecorpus GALE Kickoff Release ‐ VOA Arabic Broadcast LDC2005E60 News Audio GALE Ecorpus GALE Kickoff Release ‐ Broadcast Conversation LDC2005E61 Audio V1.0 GALE Ecorpus GALE Kickoff Release ‐ Broadcast News Audio LDC2005E62 V1.0 GALE Ecorpus GALE Kickoff Release ‐ Broadcast Conversation LDC2005E63 Transcripts V1.0 GALE Ecorpus GALE Kickoff Release ‐ Arabic Names Extracted LDC2005E66 from ACE V1.0 GALE Ecorpus GALE Kickoff Release ‐ Arabic Names Extracted LDC2005E68 from ATB V1.0 GALE Ecorpus GALE Kickoff Release ‐ English‐Arabic Parallel LDC2005E69 Treebank V1.0 GALE Ecorpus GALE Kickoff Release ‐ VOA Arabic Broadcast LDC2005E71 News Transcripts GALE Ecorpus GALE Kickoff Release 2 ‐‐ Levantine Arabic CTS LDC2005E76 Audio GALE Ecorpus GALE Kickoff Release 2 ‐‐ Levantine Arabic CTS LDC2005E77 Transcripts GALE Ecorpus GALE Kickoff Release 2 ‐‐ Levantine Arabic CTS LDC2005E78 Treebank GALE Ecorpus GALE Kickoff Release 2 ‐ English CTS Treebank LDC2005E79 with Structural Metadata GALE Ecorpus LDC2005E80 GALE Y1 Q1 Release Broadcast Audio Data V1.0 GALE Ecorpus LDC2005E81 GALE Y1 Q1 Release ‐ Web Text Collection V1.0 GALE Ecorpus LDC2005E82 GALE Y1 Q1 Release ‐ Transcripts V1.0 GALE Ecorpus LDC2005E83 GALE Y1 Q1 Release ‐ Translations V1.0 GALE Ecorpus LDC2005E84 GALE Y1 Q1 Release ‐ Arabic Treebank v 1.0 GALE Ecorpus GALE Y1 Q1 Release ‐ English Translation LDC2005E85 Treebank v 1.0 GALE Ecorpus LDC2006E10 GALE Y1 ‐ Chinese BN Common DevTest Set GALE Ecorpus RT‐03 STT Evaluation Chinese BN Audio (for GALE LDC2006E11 Y1 Common DevTest Set) GALE Ecorpus LDC2006E12 GALE Y1 ‐ Distillation Dry Run Data GALE Ecorpus LDC2006E13 GALE Y1 ‐ MT Chinese Extended Dry Run GALE Ecorpus LDC2006E14 GALE Y1 ‐ MT Arabic Extended Dry Run GALE Ecorpus GALE Y1 ‐ Distillation Training Query Answer Keys LDC2006E15 V7.0 GALE Ecorpus LDC2006E17 GALE Y1 ‐ MT Dry Run GALE Ecorpus LDC2006E18 GALE Y1 ‐ MT Evaluation GALE Ecorpus LDC2006E21 GALE Y1 ‐ Distillation Evaluation Audio GALE Ecorpus LDC2006E22 GALE Y1 ‐ Distillation Evaluation Newswire GALE Ecorpus LDC2006E23 GALE Y1 ‐ Interim Release: Transcripts GALE Ecorpus LDC2006E24 GALE Y1 ‐ Interim Release: Translations GALE Ecorpus LDC2006E25 GALE Y1 ‐ Arabic English Parallel News Text GALE Ecorpus LDC2006E26 GALE Y1 ‐ English Chinese Parallel Financial News GALE Ecorpus LDC2006E31 GALE Y1 Q2 Release ‐ Broadcast Audio V1.0 GALE Ecorpus LDC2006E32 GALE Y1 Q2 Release ‐ Web Text Collection V1.0 GALE Ecorpus LDC2006E33 GALE Y1 Q2 Release ‐ Transcripts V1.0 GALE Ecorpus LDC2006E34 GALE Y1 Q2 Release ‐ Translations V2.0 GALE Ecorpus LDC2006E35 GALE Y1 Q2 Release ‐ Arabic Treebank v 1.0 GALE Ecorpus GALE Y1 Q2 Release ‐ English Translation LDC2006E36 Treebank v 1.0 GALE Ecorpus

LDC2006E45 GALE Y1 ‐ Distillation Blind Evaluation Newswire GALE Ecorpus GALE Y1 ‐ Distillation Blind Evaluation Audio Part LDC2006E46_A A GALE Ecorpus GALE Y1 ‐ Distillation Blind Evaluation Audio Part LDC2006E46_B B GALE Ecorpus GALE Y1 ‐ Distillation Blind Evaluation Audio Part LDC2006E46_C C GALE Ecorpus GALE Y1 ‐ Distillation Blind Evaluation Audio Part LDC2006E46_D D GALE Ecorpus GALE Y1 ‐ Distillation Blind Evaluation Audio Part LDC2006E46_E E GALE Ecorpus LDC2006E52 GALE Y1 ‐ MT Eval Post Editing Results GALE Ecorpus LDC2006E57 GALE Y1 Evaluation Data ‐ Retranscribed GALE Ecorpus LDC2006E77 GALE Y1 Q3 Release ‐ Web Text Collection GALE Ecorpus GALE Y1 Q3 Release ‐ English Translation LDC2006E82 Treebank GALE Ecorpus LDC2006E83 GALE Y1 Q3 Release ‐ Broadcast Audio GALE Ecorpus LDC2006E84 GALE Y1 Q3 Release ‐ Transcripts GALE Ecorpus LDC2006E85 GALE Y1 Q3 Release ‐ Translations GALE Ecorpus LDC2006E86 GALE Y1 Q3 Release ‐ Word Alignment GALE Ecorpus LDC2006E87 GALE Y1 Q3 Release ‐ Arabic Treebank GALE Ecorpus LDC2006E88 GALE Y1 ‐ Web 1T 5‐gram Version 1 GALE Ecorpus LDC2006E89 GALE Y1 Q4 Release ‐ Broadcast Audio GALE Ecorpus LDC2006E90 GALE Y1 Q4 Release ‐ Web Text Collection GALE Ecorpus LDC2006E91 GALE Y1 Q4 Release ‐ Transcripts GALE Ecorpus LDC2006E92 GALE Y1 Q4 Release ‐ Translations GALE Ecorpus LDC2006E93 GALE Y1 Q4 Release ‐ Word Alignment GALE Ecorpus LDC2006E94 GALE Y1 Q4 Release ‐ Arabic Treebank GALE Ecorpus GALE Y1 Q4 Release ‐ English Translation LDC2006E95 Treebank GALE Ecorpus GALE Phase 2 Distillation Evaluation ‐ LDC2007E01 Supplemental English Broadcast Audio GALE Ecorpus GALE Phase 2 Distillation Evaluation ‐ LDC2007E02 Supplemental Multilingual Newswire GALE Ecorpus LDC2007E03 GALE Phase 2 Release 1 ‐ Broadcast Audio GALE Ecorpus LDC2007E04 GALE Phase 2 Release 1 ‐ Web Text GALE Ecorpus LDC2007E05 GALE Phase 2 Release 1 ‐ Transcripts GALE Ecorpus LDC2007E06 GALE Phase 2 Release 1 ‐ Translations GALE Ecorpus LDC2007E100 GALE Phase 3 Release 1 ‐ Transcripts GALE Ecorpus LDC2007E101 GALE Phase 3 Release 1 ‐ Translations GALE Ecorpus LDC2007E102 GALE Phase 3 Release 1 ‐ Web Text V 1.0 GALE Ecorpus LDC2007E103 GALE Phase 3 Release 1 ‐ Found Parallel Text GALE Ecorpus LDC2007E104 GALE Phase 3 Release 1 ‐ Distillation V1.1 GALE Ecorpus GALE Phase 3 Release 1 ‐ English Translation LDC2007E105 Treebank GALE Ecorpus LDC2007E12 GALE Phase 2 Dry Run Translations GALE Ecorpus LDC2007E13 GALE Phase 2 Distillation ‐ Training V5.0 GALE Ecorpus LDC2007E15 GALE Phase 2 DevTest ‐ Broadcast Audio V1.0 GALE Ecorpus GALE Phase 2 DevTest ‐ Source Text, Transcripts LDC2007E16 and Translations V2.2 GALE Ecorpus LDC2007E29 OntoNotes V1.0 ‐ GALE Pre‐Release GALE Ecorpus RESTRICTED ‐ GALE Phase 2 Evaluation Pool ‐ LDC2007E30 Source Text Files V2.0 GALE Ecorpus GALE Phase 2 DevTest ‐ Broadcast Audio LDC2007E32_A Supplement (Part 1) GALE Ecorpus GALE Phase 2 DevTest ‐ Broadcast Audio LDC2007E32_B Supplement (Part 2) GALE Ecorpus GALE Phase 2 DevTest ‐ Broadcast Audio LDC2007E32_C Supplement (Part 3) GALE Ecorpus GALE Phase 2 DevTest ‐ Broadcast Audio LDC2007E32_D Supplement (Part 4) GALE Ecorpus LDC2007E43 GALE Phase 2 Release 2 ‐ Broadcast Audio GALE Ecorpus LDC2007E44 GALE Phase 2 Release 2 ‐ Web Text GALE Ecorpus LDC2007E45 GALE Phase 2 Release 2 ‐ Transcripts GALE Ecorpus LDC2007E46 GALE Phase 2 Release 2 ‐ Translations GALE Ecorpus RESTRICTED ‐ GALE Phase 2 Evaluation Pool ‐ LDC2007E52 Broadcast Audio V 1.0 GALE Ecorpus LDC2007E53 Pitt/CMU 18 GALE Distillation Task Scenario Kit GALE Ecorpus RESTRICTED ‐ GALE Phase 2 GNG Evaluation LDC2007E54 Reference GALE Ecorpus GALE Phase 2 GNG Evaluation ‐ Translation LDC2007E56 References GALE Ecorpus GALE Phase 2 GNG Evaluation ‐ Transcription LDC2007E57 References GALE Ecorpus LDC2007E60 GALE Phase 3 DevTest ‐ Broadcast Audio GALE Ecorpus GALE Phase 3 DevTest ‐ Source Text, Transcripts LDC2007E61 and Translations GALE Ecorpus LDC2007E73 OntoNotes V2.0 ‐ GALE Pre‐Release GALE Ecorpus LDC2007E86 GALE Phase 2 Release 3 ‐ Transcripts GALE Ecorpus LDC2007E87 GALE Phase 2 Release 3 ‐ Translations GALE Ecorpus LDC2007E99 GALE Phase 3 Release 1 ‐ Broadcast Audio GALE Ecorpus

LDC2008E02 GALE Phase 2 Retest (P2.5) Evaluation References GALE Ecorpus GALE Phase 1 Evaluation Data (NIST MT06 LDC2008E03 version) GALE Ecorpus

LDC2008E04 GALE Phase 2 Retest Chinese Source Transcripts GALE Ecorpus GALE P3 Distillation English NW/Web Training LDC2008E07 Pool GALE Ecorpus LDC2008E08 MT08 References for GALE GALE Ecorpus LDC2008E09 IBM Multiple Translation Corpus for GALE GALE Ecorpus GALE Phase 3 Word Alignment Training Data Part LDC2008E10 1 GALE Ecorpus LDC2008E11 GALE Phase 2 + Retest Evaluation References GALE Ecorpus

LDC2008E16 GALE Phase 3 Distillation QA Training Data V3.1 GALE Ecorpus

LDC2008E18 GALE Phase 3 Supra‐lexical Annotation Pilot V1.4 GALE Ecorpus LDC2008E19 GALE Phase 3 Multiple Translations V3.0 GALE Ecorpus LDC2008E38 GALE Phase 3 Release 2 ‐ Broadcast Audio GALE Ecorpus LDC2008E39 GALE Phase 3 Release 2 ‐ Transcripts GALE Ecorpus LDC2008E40 GALE Phase 3 Release 2 ‐ Translations GALE Ecorpus LDC2008E41 GALE Phase 3 Release 2 ‐ Web Text GALE Ecorpus LDC2008E42 GALE Phase 3 ‐ MTPlus Pilot GALE Ecorpus LDC2008E53 GALE Phase 4 Release 1 ‐ Broadcast Audio V1.0 GALE Ecorpus LDC2008E54 GALE Phase 4 Release 1 ‐ Web Text V1.0 GALE Ecorpus LDC2008E55 GALE Phase 4 Release 1 ‐ Transcripts V1.0 GALE Ecorpus LDC2008E56 GALE Phase 4 Release 1 ‐ Translations V2.0 GALE Ecorpus LDC2008E57 GALE Phase 4 Release 1 ‐ Word Alignment V1.0 GALE Ecorpus LDC2008E65 GALE P3 Distillation Experiment ‐ Source Data GALE Ecorpus LDC2009E01 GALE Phase 3 Evaluation References GALE Ecorpus LDC2009E05 OntoNotes Release 2.9 for GALE GALE Ecorpus GALE Arabic Dialect Classification Web Text Pilot LDC2009E07 Study GALE Ecorpus LDC2009E105 GALE Phase 4 Release 3 ‐ Found Parallel Text GALE Ecorpus GALE Phase 4 Release 3 ‐ Broadcast Audio LDC2009E107 Supplement GALE Ecorpus

LDC2009E108 Arabic Treebank Part 6 V1.0 ‐ GALE Phase 4 dev09 GALE Ecorpus English Translation Treebank Part 7 V1.0 ‐ GALE LDC2009E109 Phase 4 dev09 GALE Ecorpus LDC2009E13 GALE Phase 4 Release 2 ‐ Broadcast Audio GALE Ecorpus LDC2009E14 GALE Phase 4 Release 2 ‐ Web Text GALE Ecorpus LDC2009E15 GALE Phase 4 Release 2 ‐ Transcripts GALE Ecorpus LDC2009E16 GALE Phase 4 Release 2 ‐ Translations GALE Ecorpus LDC2009E17 GALE Arabic Parallel Aligned Treebank Pilot GALE Ecorpus

LDC2009E18 OntoNotes English Parse Trees GALE Pre‐Release GALE Ecorpus Patch for GALE Phase 4 Release 2 ‐ Broadcast LDC2009E25 Audio GALE Ecorpus GALE Phase 4 ‐ Dialect Classification of broadcast LDC2009E51 conversation and webtext GALE Ecorpus LDC2009E54 GALE Chinese Word Alignment Tagging Pilot GALE Ecorpus LDC2009E60 OntoNotes V3.0 ‐ GALE Pre‐Release GALE Ecorpus LDC2009E66 GALE Phase4 Distillation Source Audio and Text GALE Ecorpus

LDC2009E67 GALE Phase4 Distillation English Forced Alignment GALE Ecorpus LDC2009E68 GALE Phase4 DevTest Text Source GALE Ecorpus LDC2009E69 GALE Phase 4 DevTest Audio Source Snippets GALE Ecorpus LDC2009E70 GALE Phase 4 DevTest Audio Source Transcripts GALE Ecorpus GALE Phase4 DevTest Audio Supralexical LDC2009E71 Annotation GALE Ecorpus LDC2009E78 GALE Phase 4 DevTest NW & WB Translations GALE Ecorpus GALE Phase 4 Arabic Training Transcripts, ASR‐ LDC2009E81 selection sample GALE Ecorpus GALE Phase 4 Arabic Parallel Aligned Treebank LDC2009E82 Part 1 V1.2 GALE Ecorpus GALE Phase 4 Chinese Parallel Word Alignment LDC2009E83 and Tagging Part 1 V1.1 GALE Ecorpus GALE Phase4 Distillation Training Queries for LDC2009E85 Relevance Judgment GALE Ecorpus LDC2009E87 GALE Phase 4 DevTest BC and BN Translations GALE Ecorpus GALE Phase 4 DevTest Arabic Parallel Aligned LDC2009E88 Treebank V1.1 GALE Ecorpus GALE Phase 4 DevTest Chinese Word Alignment LDC2009E89 and Tagging V1.1 GALE Ecorpus

LDC2009E91 GALE Phase4 Distillation Training Annotation V2.0 GALE Ecorpus LDC2009E92 GALE Phase 4 Release 3 ‐ Broadcast Audio GALE Ecorpus LDC2009E93 GALE Phase 4 Release 3 ‐ Web Text GALE Ecorpus LDC2009E94 GALE Phase 4 Release 3 ‐ Transcripts GALE Ecorpus LDC2009E95 GALE Phase 4 Release 3 ‐ Translations V1.2 GALE Ecorpus GALE Phase4 Distillation Training Relevance LDC2009E98 Judgments V4.0 GALE Ecorpus GALE Phase 5 Chinese Parallel Word Alignment LDC2010E05 and Tagging Part 1 GALE Ecorpus GALE Phase 5 Arabic Parallel Aligned Treebank LDC2010E06 Part 1 GALE Ecorpus LDC2010E09 GALE Phase 4 Evaluation References GALE Ecorpus GALE Phase 5 Chinese Parallel Word Alignment LDC2010E13 and Tagging Part 2 GALE Ecorpus GALE Phase 5 Arabic Parallel Aligned Treebank LDC2010E14 Part 2 GALE Ecorpus LDC2010E22 Arabic Treebank Part 10 V1.0 ‐ GALE Phase 4 BN GALE Ecorpus

LDC2010E23 GALE P3 Run of the Mill Translation MTPE Results GALE Ecorpus GALE Phase5 DevTest Audio Source Superset LDC2010E25 Transcripts V1.0 GALE Ecorpus LDC2010E26 GALE Phase5 Acoustic Variation Audio Samples GALE Ecorpus GALE Phase 5 DevTest NW & WB Translations LDC2010E30 V3.0 GALE Ecorpus GALE Phase 5 Chinese Parallel Word Alignment LDC2010E37 and Tagging Part 3 GALE Ecorpus GALE Phase 5 Arabic Parallel Aligned Treebank LDC2010E38 Part 3 GALE Ecorpus English Translation Treebank, EATB Part 9 V1.0 ‐ LDC2010E40 GALE Phase 4 BN GALE Ecorpus LDC2010E41 Arabic Treebank Part 8 V2.0 ‐ GALE Phase 4 BN GALE Ecorpus LDC2010E42 Arabic Treebank Part 9 V2.0 ‐ GALE Phase 4 BN GALE Ecorpus

LDC2010E43 GALE Phase 5 DevTest BC & BN Translations V2.2 GALE Ecorpus LDC2010E44 GALE Phase 5 DevTest Audio Source Transcripts GALE Ecorpus LDC2010E45 GALE Phase 5 DevTest ‐ Broadcast Audio GALE Ecorpus LDC2010E48 GALE Phase 5 STT‐DevTest Transcripts V2.0 GALE Ecorpus LDC2010E49 GALE Phase 5 STT‐DevTest Audio GALE Ecorpus LDC2010E55 OntoNotes 4.0 ‐ GALE Pre‐Release GALE Ecorpus LDC2010E57 GALE Phase5 Unfiltered Audio Selection GALE Ecorpus English Translation Treebank, EATB Part 10 V1.0 ‐ LDC2010E60 GALE Phase 5 BN GALE Ecorpus

LDC2010E62 GALE Phase 5 Distillation Segmented Eval Audio GALE Ecorpus GALE Phase 5 Chinese Parallel Word Alignment LDC2010E63 and Tagging Part 4 GALE Ecorpus GALE Phase 5 Arabic Parallel Aligned Treebank LDC2010E64 Part 4 GALE Ecorpus English Translation Treebank, EATB Part 11 V1.0 ‐ LDC2010E71 GALE Phase 5 WB GALE Ecorpus LDC2010E72 Arabic Treebank Part 12 V1.0 ‐ GALE Phase 4 BN GALE Ecorpus LDC2010E73 Arabic Treebank Part 14 V1.0 ‐ GALE Phase 5 BC GALE Ecorpus LDC2010E74 Arabic Treebank Part 16 V1.0 ‐ GALE Phase 5 BC GALE Ecorpus LDC2010E77 GALE Phase 5 Levantine Arabic Segments GALE Ecorpus GALE Phase 5 Levantine Arabic Dialect Judgments LDC2010E79 and Translations GALE Ecorpus GALE Phase 5 Chinese Parallel Word Alignment LDC2011E03 and Tagging Part 5 GALE Ecorpus GALE Phase 5 Arabic Parallel Aligned Treebank LDC2011E04 Part 4 GALE Ecorpus English Translation Treebank, EATB Part 12 V1.0 LDC2011E05 (POS) ‐ GALE Phase 5 BN GALE Ecorpus

LDC2011E16 Arabic Treebank Part 11 V2.0 ‐ GALE Phase 4 WB GALE Ecorpus LDC2011E17 Arabic Treebank Part 12 V2.0 ‐ GALE Phase 4 BN GALE Ecorpus

LDC2011E18 Arabic Treebank Part 13 V1.0 ‐ GALE Phase 4 WB GALE Ecorpus English Translation Treebank, EATB Part 6 v2.1 ‐ LDC2011E19 GALE Phase 5 NW GALE Ecorpus GALE Phase 5 Eval Source Transcripts and LDC2011E21 Translation GALE Ecorpus English Translation Treebank, EATB Part 13 V1.0 ‐ LDC2011E23 GALE Phase 5 BC GALE Ecorpus GALE Phase 5 Eval Superset Source Transcripts LDC2011E25 and Translation GALE Ecorpus GALE Phase 5 Chinese Parallel Word Alignment LDC2011E39 and Tagging Part 6 GALE Ecorpus GALE Phase 5 Arabic Parallel Aligned Treebank LDC2011E40 Part 6 GALE Ecorpus LDC2011E50 GALE Phase 3 and 4 Eval Superset GALE Ecorpus LDC2011E52 Arabic Treebank Part 14 V2.0 ‐ GALE Phase 5 BC GALE Ecorpus

LDC2011E53 Arabic Treebank Part 15 V1.0 ‐ GALE Phase 5 WB GALE Ecorpus LDC2011E54 Arabic Treebank Part 16 V2.0 ‐ GALE Phase 5 BC GALE Ecorpus GALE Phase 5 Chinese Parallel Word Alignment LDC2011E57 and Tagging Part 7 GALE Ecorpus GALE Phase 5 Arabic Parallel Aligned Treebank LDC2011E58 Part 7 GALE Ecorpus English Translation Treebank, EATB Part 14 V1.0 ‐ LDC2011E63 GALE Phase 5 IA BC GALE Ecorpus English Translation Treebank, ECTB Part 1 V2.1 ‐ LDC2011E64 GALE Phase 5 NW GALE Ecorpus English Translation Treebank, ECTB Part 2 V2.2 ‐ LDC2011E65 GALE Phase 5 NW GALE Ecorpus English Translation Treebank, EATB Part 3 V2.1 ‐ LDC2011E66 GALE Phase 5 BN GALE Ecorpus English Translation Treebank, EATB Part 4 V1.1 ‐ LDC2011E67 GALE Phase 5 WB GALE Ecorpus English Translation Treebank, EATB Part 5 V2.1 ‐ LDC2011E74 GALE Phase 5 NW GALE Ecorpus English Translation Treebank, EATB Part 7 V1.1 ‐ LDC2011E75 GALE Phase 5 dev09 NW & WB GALE Ecorpus English Translation Treebank, EATB Part 8 V1.1 ‐ LDC2011E76 GALE Phase 5 BN GALE Ecorpus English Translation Treebank, EATB Part 9 V1.1 ‐ LDC2011E77 GALE Phase 5 BN GALE Ecorpus English Translation Treebank, EATB Part 10 V1.1 ‐ LDC2011E78 GALE Phase 5 BN GALE Ecorpus English Translation Treebank, EATB Part 11 V1.1 ‐ LDC2011E79 GALE Phase 5 WB GALE Ecorpus English Translation Treebank, EATB Part 12 V1.1 LDC2011E80 (POS) ‐ GALE Phase 5 BN GALE Ecorpus English Translation Treebank, EATB Part 13 V1.1 ‐ LDC2011E81 GALE Phase 5 BC GALE Ecorpus

GALE FOUO Ecorpus (requires extra FOUO LDC2005G01 GALE Kickoff Release ‐ NGA Name V1.0 precautions)

GALE FOUO Ecorpus (requires GALE Kickoff Release ‐ Intel Transliteration extra FOUO LDC2005G02 Standard (FOUO) precautions)

GALE FOUO Ecorpus (requires GALE Kickoff Release ‐ BGN Romanization Guide extra FOUO LDC2005G03 V1.0 precautions)

GALE FOUO Ecorpus (requires FBIS Arabic Broadcast News Speech and extra FOUO LDC2005G04 Transcripts for GALE (FOUO) precautions)

GALE FOUO Ecorpus (requires GALE ‐ Chinese/English FBIS for OCR Processing extra FOUO LDC2006G01_A Part A precautions)

GALE FOUO Ecorpus (requires GALE ‐ Chinese/English FBIS for OCR Processing extra FOUO LDC2006G01_B Part B precautions)

GALE FOUO Ecorpus (requires GALE ‐ Chinese/English FBIS for OCR Processing extra FOUO LDC2006G01_C Part C precautions)

GALE FOUO Ecorpus (requires GALE ‐ Chinese/English FBIS for OCR Processing extra FOUO LDC2006G01_D Part D precautions)

GALE FOUO Ecorpus (requires GALE ‐ Chinese/English FBIS for OCR Processing extra FOUO LDC2006G01_E Part E precautions)

GALE FOUO Ecorpus (requires GALE Y1 Q2 Release ‐ LDC/FBIS/NVTC Parallel Text extra FOUO LDC2006G05 V2.0 (FOUO) precautions)

GALE FOUO Ecorpus (requires extra FOUO LDC2006G06 GALE Y1 Q2 Release ‐ NVTC Parallel Text (FOUO) precautions)

GALE FOUO Ecorpus (requires GALE Y1 ‐ BBN Iraqi Broadcast Conversation extra FOUO LDC2006G07 Corpus precautions)

GALE FOUO Ecorpus (requires extra FOUO LDC2006G08 GALE Y1 Q3 Release ‐ FBIS Translations (FOUO) precautions)

GALE FOUO Ecorpus (requires GALE Y1 ‐ IBM Arabic‐English Word Alignment extra FOUO LDC2006G09 Corpus precautions) GALE FOUO Ecorpus (requires extra FOUO LDC2008G04 GALE Phase 3 ‐ OSC Translations V1.0 (FOUO) precautions)

GALE FOUO Ecorpus (requires extra FOUO LDC2008G05 GALE Phase 3 ‐ OSC Alignment (FOUO) precautions)

GALE FOUO Ecorpus (requires GALE Phase 4 Release 2 ‐ Harvested Translations extra FOUO LDC2009G01 (FOUO) precautions)

GALE FOUO Ecorpus (requires GALE Phase 4 Release 3 ‐ Harvested Translations extra FOUO LDC2009G02 (FOUO) precautions)

GALE FOUO Ecorpus (requires GALE Phase 5 Release 1 ‐ Harvested Translations extra FOUO LDC2011G01 (FOUO) precautions) Chinese English Translation Lexicon Version 3‐ LDC2002E14 beta TIDES Ecorpus LDC2002E15 UN Arabic English Parallel Text Version 1 beta TIDES Ecorpus LDC2002E16 Hong Kong News Parallel Text Version 2 beta TIDES Ecorpus English Translation of Chinese Treebank Version 1 LDC2002E17 beta TIDES Ecorpus Xinhua Chinese English Parallel News Text Version LDC2002E18 1 beta TIDES Ecorpus

LDC2002E19 Hong Kong Hansard Parallel Text Version 2 beta TIDES Ecorpus LDC2002E27 Chinese English Translation Dictionary v3.0 TIDES Ecorpus

LDC2002E50 Name‐Annotated TDT Corpus Supplement for ACE TIDES Ecorpus LDC2002E53 Multiple‐Translation Chinese Corpus 2.0 TIDES Ecorpus LDC2002E54 Multiple‐Translation Arabic Corpus TIDES Ecorpus LDC2002E58 Sinorama Chinese English Parallel Text TIDES Ecorpus Chinese <‐> English Name Entity Lists Version 1.0 LDC2003E01 beta TIDES Ecorpus LDC2003E05 Arabic Translation Corpus Part 1 TIDES Ecorpus LDC2003E06 Chinese Treebank 3.0 TIDES Ecorpus LDC2003E07 Chinese Treebank English Parallel Corpus TIDES Ecorpus LDC2003E08 Chinese News Translation Corpus Part 1 TIDES Ecorpus LDC2003E09 Arabic News Translation Corpus Part 2 TIDES Ecorpus LDC2003E14 FBIS Multilanguage Texts TIDES Ecorpus LDC2003E15 HARD GovDocs TIDES Ecorpus LDC2003E25 Hong Kong News Parallel Text TIDES Ecorpus LDC2004E07 Arabic News Translation Corpus Part 3 TIDES Ecorpus LDC2004E08 Arabic English Parallel News Text Part 1 TIDES Ecorpus LDC2004E09 Hong Kong Hansard Parallel Text TIDES Ecorpus LDC2004E11 Arabic News Translation Corpus Part 4 TIDES Ecorpus LDC2004E12 UN Chinese English Parallel Text TIDES Ecorpus LDC2004E13 UN Arabic English Parallel Text TIDES Ecorpus LDC2004E38 ACE 2003 Evaluation Data (for 2004 DevTest) TIDES Ecorpus LDC2004E41 TDT5 Multilanguage Text Corpus TIDES Ecorpus LDC2004E42 HARD 2004 Reference Annotations TIDES Ecorpus LDC2004E45 TDT5‐2004 Reference Annotations ‐ Version 3.0 TIDES Ecorpus LDC2004E46 DUC 2004 Arabic‐English Summaries TIDES Ecorpus LDC2004E71 ATB Part 3 (a) v.1.1 TIDES Ecorpus LDC2004E72 eTIRR Arabic English News Text TIDES Ecorpus Arabic Treebank: Part 3 v.1.0 (POS + Syntactic LDC2005E11 Analysis of total corpus) TIDES Ecorpus LDC2005E12 2005 MSE Arabic‐English Clusters V1.2 TIDES Ecorpus LDC2005E13 2005 MSE Arabic‐English Summaries V1.2 TIDES Ecorpus LDC2005E14 MSE 2005 Sample Summary Topic TIDES Ecorpus LDC2005E16 HARD Annotations 2003 TIDES Ecorpus LDC2005E17 HARD Annotations 2004 TIDES Ecorpus LDC2005E46 Arabic Treebank English Translation TIDES Ecorpus LDC2005E47 Chinese English News Magazine Parallel Text TIDES Ecorpus