AUG 1 4 2006 BARKER LIBRARIES System for Rapid Subtitling By
System for Rapid Subtitling by Sean Joseph Leonard Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 12, 2005 ( 2005 Sean Joseph Leonard. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis and to grant others t right to do so. A u th or....................... ............................................ Dep rtment of Electrical Engineering and Computer Science 7/ A / f\ September 12, 2005 C ertified b y ......... ......... ,.................. ... ............................................ Harold Abelson Class of 1922-Professor of Computer Science and Engineering Thesis Supervisor Accepted by. .. Arthur C. Smith Chairman, Department Committee on Graduate Theses MASSACHUSETTS INS E OF TECHNOGY AUG 1 4 2006 BARKER LIBRARIES System for Rapid Subtitling by Sean Joseph Leonard Submitted to the Department of Electrical Engineering and Computer Science on September 12, 2005, in partial fulfillment of the requirements for the degree of Master of Engineering Abstract A system for rapid subtitling of audiovisual sequences was developed, and evaluated. This new system resulted in average time-savings of 50% over the previous work in the field. To subtitle a 27-minute English lecture, users saved 2.2 hours, spending an average of 1.9 hours timing the monologue. Subtitling a 20-minute Japanese animation resulted in similar savings. While subtitling is widely practiced, traditional speech boundary detection algorithms have proven insufficient for adoption when subtitling audiovisual sequences. Human-edited timing of subtitles is a dull and laborious exercise, yet with some human-guidance, basic systems may accurately align transcripts and translations in the presence of divergent speakers, multiple languages, and background noise.
[Show full text]