Data Available for Bolt Performers

Data Available for Bolt Performers

DATA AVAILABLE FOR BOLT PERFORMERS Data Created During BOLT; corpora automatically distributed to performers. Contact [email protected] to obtain any missing corpora Source Language cmn data Where arz data volume Relevant/Known volume (words unless eng data (arz = Egyptian (words otherwise volume Arabic; cmn = unless specified; 1 (words unless Release Mandarin Chinese; Genre Where otherwise char = 1.5 otherwise Catalog ID Title Date Type Description eng = English) Relevant/Known specified) words) specified) Discussion forums sample for eliciation of feedback on format, LDC2011E115 BOLT ‐ Sample Discussion Forums 12/22/2011 Source structure, etc. arz, cmn, eng discussion forum 67815 106130 142491 BOLT ‐ Phase 1 Discussion Forums Source Data R1 LDC2012E04 V2 3/29/2012 Source Discussion forums source data arz, cmn, eng discussion forum 33871338 36244922 29658002 LDC2012E16 BOLT ‐ Phase 1 Discussion Forums Source Data R2 3/22/2012 Source Discussion forum source data arz, cmn, eng discussion forum 118519987 264314806 273078669 LDC2012E21 BOLT ‐ Phase 1 Discussion Forums Source Data R3 4/24/2012 Source Discussion forums source data arz, cmn, eng discussion forum 127832646 279763913 282588862 LDC2012E54 BOLT ‐ Phase 1 Discussion Forums Source Data R4 5/31/2012 Source Discussion forums source data arz, cmn, eng discussion forum 368199350 838056761 676989452 List of threads rejected during triage LDC2012E62 BOLT ‐ Phase 1 Rejected Training Data Thread IDs 6/1/2012 Source for BOLT translation training data n/a discussion forum n/a n/a n/a List of source documents for IR LDC2012E82 BOLT Phase 1 IR Eval Source Data Document List 6/29/2012 Source evaluation arz, cmn, eng discussion forum 400036669 400168661 400219116 BOLT Phase 2 IR Source Data Document List and Discussion forum source documents LDC2013E08 Sample Query 1/31/2012 Source for support of P2 IR arz discussion forum 616719471 n/a n/a BOLT Phase 2 SMS and Chat Sample Source Data SMS/chat sample for eliciation of LDC2013E10 V1.1 3/5/2013 Source feedback on format, structure, etc. arz, cmn, eng SMS, chat 879 8424 6709 LDC2013E123 BOLT Phase 2 SMS/Chat Source Data R4 11/15/2013 Source SMS/chat source data arz SMS, chat 213516 n/a n/a LDC2013E49 BOLT Phase 2 SMS/Chat Source Data R1 V2 6/4/2013 Source SMS/chat source data arz, cmn SMS, chat 1958 10029 n/a LDC2013E63 BOLT Phase 2 SMS/Chat Source Data R2 V2 7/12/2013 Source SMS/chat source data arz, cmn SMS, chat 3829 280771 n/a LDC2013E84 BOLT Phase 2 SMS/Chat Source Data R3 9/25/2013 Source SMS/chat source data arz, cmn SMS, chat 95821 1585304 n/a Translation training data sample LDC2012E11 BOLT ‐ Phase 1 Translation Samples V2 3/6/2012 Translation release for BOLT P1 arz, cmn discussion forum 7 docs 17 docs n/a incremental parallel text training data LDC2012E124 BOLT Phase 1 Translation Training Data R6 10/17/2012 Translation release arz, cmn discussion forum 320887 459588 chars n/a incremental parallel text training data LDC2012E15 BOLT Phase 1 Translation Training Data R1 4/19/2012 Translation release arz, cmn discussion forum 90581 300257 chars n/a BOLT Phase 1 HTER Experiment Source and Source and translation files for BOLT LDC2012E18 Reference Translation 3/27/2012 Translation P1 HTER experiment arz, cmn discussion forum 4792 9789 chars n/a incremental parallel text training data LDC2012E19 BOLT Phase 1 Translation Training Data R2 4/30/2012 Translation release arz, cmn discussion forum 116165 52088 chars n/a Source and translation files for BOLT LDC2012E30 BOLT Phase 1 DevTest Source and Translation V4 6/25/2012 Translation P1 Devtest arz, cmn discussion forum 60296 58929 n/a incremental parallel text training data LDC2012E55 BOLT Phase 1 Translation Training Data R3 5/31/2012 Translation release arz, cmn discussion forum 311487 134284 chars n/a incremental parallel text training data LDC2012E81 BOLT Phase 1 Translation Training Data R4 6/20/2012 Translation release arz, cmn discussion forum 116073 253504 chars n/a incremental parallel text training data LDC2012E96 BOLT Phase 1 Translation Training Data R5 8/3/2012 Translation release arz, cmn discussion forum 214406 447263 chars n/a incremental parallel text training data LDC2013E118 BOLT Phase 2 Translation Training Data R3 10/11/2013 Translation release arz,cmn SMS, chat 7928 200024 n/a BOLT Phase 2 SMS and Chat DevTest Gold LDC2013E119 Standard Translation 10/18/2013 Translation gold standard translation release cmn SMS, chat n/a 5000 n/a incremental parallel text training data LDC2013E125 BOLT Phase 2 Translation Training Data R4 11/27/2013 Translation release arz,cmn SMS, chat 39796 212386 n/a incremental parallel text training data LDC2013E132 BOLT Phase 2 Translation Training Data R5 12/20/2013 Translation release cmn SMS, chat n/a 200076 n/a BOLT Phase 2 Arabizi Transliteration Translation 4 English translations of Egyptian a LDC2013E135 Experiment 1/7/2014 Translation Arabic source file arz n/a n/a n/a n/a BOLT Phase 2 Discussion Forum DevTest Gold gold standard translation for DevTest LDC2013E59 Standard Translation 7/3/2013 Translation discussion forum data arz,cmn discussion forum 4942 5044 n/a LDC2013E80 BOLT Phase 2 Translation DevTest Data R1 8/9/2013 Translation translation files for DevTest data cmn SMS, chat n/a 11621 n/a incremental parallel text training data LDC2013E81 BOLT Phase 2 Translation Training Data R1 8/9/2013 Translation release cmn SMS, chat n/a 10260 n/a LDC2013E83 BOLT Phase 2 Translation DevTest Data R2 8/26/2013 Translation translation files for DevTest data cmn SMS, chat n/a 31592 n/a incremental parallel text training data LDC2013E85 BOLT Phase 2 Translation Training Data R2 9/13/2013 Translation release cmn SMS, chat n/a 187205 n/a BOLT Phase 2 Additional Discussion Forum translation for discussion forum LDC2013E92 Translation DevTest Data 8/23/2013 Translation DevTest data arz,cmn discussion forum 81,928 50168 chars n/a incremental parallel text training data conversational LDC2013E94 BOLT Phase 2 Arabic CTS Translation Data R1 V2 3/18/2014 Translation release arz telephone speech 122534 n/a n/a incremental parallel text training data conversational LDC2014E08 BOLT Phase 3 Translation Training Data R1 V2 2/20/2014 Translation release cmn telephone speech n/a 199993 n/a LDC2014E09 BOLT Phase 2 Translation DevTest Data R3 2/14/2014 Translation translation files for DevTest data arz SMS, chat 35937 n/a n/a incremental parallel text training data LDC2014E18 BOLT Phase 2 Translation Training Data R6 2/28/2014 Translation release arz SMS, chat 358102 n/a n/a BOLT Phase 3 Translation With Audio Experiment translation experiment with audio conversational LDC2014E19 V2 3/10/2014 Translation access arz, cmn telephone speech 3609 3718 n/a BOLT Phase 2 SMS and Chat DevTest Gold gold standard translation for LDC2014E25 Standard Translation R2 3/25/2014 Translation SMS/chat data arz SMS, chat 4987 n/a n/a BOLT Phase 2 Egyptian Arabic SMS and Chat transliteration of Arabic SMS/chat LDC2013E121 Transliterated Sample Conversations 10/30/2013 Transliteration data arz SMS, chat 40304 n/a n/a BOLT Phase 2 Egyptian Arabic SMS and Chat Transliterated Sample Conversations with Manual manually corrected translateration of LDC2013E131 Correction 12/3/2013 Transliteration SMS/chat data arz SMS, chat 3784 n/a n/a BOLT Phase 1 Chinese Parallel Word Alignment Word aligned Chinese‐English LDC2012E24 and Tagging Part 1 6/8/2012 Word Alignment discussion forum parallel text cmn discussion forum n/a 59579 n/a BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2012E51 Alignment Part 1 V2 7/10/2012 Word Alignment discussion forum parallel text arz discussion forum 68762 n/a n/a BOLT Phase 1 Chinese Parallel Word Alignment Word aligned Chinese‐English LDC2012E72 and Tagging Part 2 7/10/2012 Word Alignment discussion forum parallel text cmn discussion forum n/a 101957 n/a BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2012E94 Alignment DF Part 2 v2 8/7/2012 Word Alignment discussion forum parallel text arz discussion forum 49334 n/a n/a BOLT Phase 1 ‐ Chinese Parallel Word Alignment Word aligned Chinese discussion LDC2012E95 and Tagging Part 3 8/7/2012 Word Alignment forum data cmn discussion forum n/a 102167 n/a BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2013E01 Alignment DF 1/31/2013 Word Alignment discussion forum parallel text arz discussion forum 38610 n/a n/a BOLT Phase 1 Chinese Parallel Word Alignment Word aligned Chinese‐English LDC2013E02 and Tagging DF Part 4 1/31/2013 Word Alignment discussion forum parallel text cmn discussion forum n/a 166388 n/a BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2013E09 Alignment DF Part 4 2/28/2013 Word Alignment discussion forum parallel text arz discussion forum 54903 n/a n/a BOLT Phase 1 Egyptian Arabic Parallel Word Word aligned Egyptian Arabic‐English LDC2013E25 Alignment DF Part 5 3/28/2013 Word Alignment discussion forum parallel text arz discussion forum 98000 n/a n/a BOLT Phase 1 Egyptian Arabic Parallel Word Word alignment Egyptian Arabic‐ LDC2013E31 Alignment DF Part 6 4/12/2013 Word Alignment English discussion forum parallel text arz discussion forum 61112 n/a n/a BOLT Phase 1 Egyptian Arabic Parallel Word Word alignment Egyptian Arabic‐ LDC2013E43 Alignment DF Part 7 5/9/2013 Word Alignment English discussion forum parallel text arz discussion forum 65054 n/a n/a BOLT Phase 1 Chinese Parallel Word Alignment Word aligned Chinese‐English LDC2013E51 and Tagging DF Part

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us