It Takes Two to TANGO: Combining Visual and Textual Information for Detecting Duplicate Video-Based Bug Reports Nathan Cooper∗, Carlos Bernal-Cardenas´ ∗, Oscar Chaparro∗, Kevin Morany, Denys Poshyvanyk∗ ∗College of William & Mary (Williamsburg, VA, USA), yGeorge Mason University (Fairfax, VA, USA)
[email protected],
[email protected],
[email protected],
[email protected],
[email protected] Abstract—When a bug manifests in a user-facing application, it into mobile apps, developers are likely to face a growing set is likely to be exposed through the graphical user interface (GUI). of challenges related to processing and managing app screen- Given the importance of visual information to the process of recordings in order to triage and resolve bugs — and hence identifying and understanding such bugs, users are increasingly making use of screenshots and screen-recordings as a means maintain the quality of their apps. to report issues to developers. However, when such information One important challenge that developers will likely face is reported en masse, such as during crowd-sourced testing, in relation to video-related artifacts is determining whether managing these artifacts can be a time-consuming process. As two videos depict and report the same bug (i.e., detecting the reporting of screen-recordings in particular becomes more duplicate video-based bug reports), as it is currently done popular, developers are likely to face challenges related to manually identifying videos that depict duplicate bugs. Due to for textual bug reports [27, 86, 87]. When video-based bug their graphical nature, screen-recordings present challenges for reports are collected at scale, either via a crowdsourced testing automated analysis that preclude the use of current duplicate bug service [8–13, 15, 17–19] or by popular apps, the sizable report detection techniques.