A Study on Summarizing and Evaluating Long Documents
A Study on Summarizing and Evaluating Long Documents Anonymous ACL submission Abstract code the input sequence into a single vectorial rep- 040 resentation and another RNN to extract the target 041 001 Text summarization has been a key language sequence. This approach allowed the generation 042 002 generation task for over 60 years. The field of sequences of arbitrary length conditioned upon 043 003 has advanced considerably during the past an input document and therefore was adopted by 044 004 two years, benefiting from the proliferation of 005 pre-trained Language Models (LMs). How- the summarization community as the first viable 045 006 ever, the field is constrained by two fac- attempt at abstractive summarization (e.g. (See 046 007 tors: 1) the absence of an effective auto- et al., 2017), (Nallapati et al., 2016)). The pace 047 008 matic evaluation metric and 2) a lack of ef- of progress further accelerated leveraging trans- 048 009 fective architectures for long document sum- formers (Vaswani et al., 2017) as these are able to 049 010 marization. Our first contribution is to demon- generate outputs with higher fluency and coherence 050 011 strate that a set of semantic evaluation metrics than was previously possible. This improvement in 051 012 (BERTScore, MoverScore and our novel met- performance drove a push into a wider and more 052 013 ric, BARTScore) consistently and significantly 014 outperform ROUGE. Using these metrics, we diversified range of datasets, broadening from sum- 053 015 then show that combining transformers with marizing short news articles (e.g. (Hermann et al., 054 016 sparse self-attention is a successful method for 2015), (Narayan et al., 2018)) to generating news 055 017 long document summarization and is competi- headlines (Rush et al., 2015), longer scientific docu- 056 018 tive with the state of the art.
[Show full text]