Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) Dress like an Internet Celebrity: Fashion Retrieval in Videos Hongrui Zhao1 , Jin Yu2 , Yanan Li3 , Donghui Wang1∗ , Jie Liu2 , Hongxia Yang2 and Fei Wu1 1College of Computer Science and Technology, Zhejiang University 2Alibaba Group 3Institute of Artificial Intelligence, Zhejiang Lab fhrzhao, ynli, dhwang,
[email protected], fkola.yu, sanshuai.lj, yang.yhxg.alibaba-inc.com Abstract (a) IFN LSTM Same/Different Nowadays, both online shopping and video shar- IFN LSTM Similarity IFN DT ing have grown exponentially. Although internet Select Manually IFN LSTM celebrities in videos are ideal exhibition for fash- Same/Different … … … ion corporations to sell their products, audiences (b) do not always know where to buy fashion prod- … … Similarity ucts in videos, which is a cross-domain problem Detect IFN IFN called video-to-shop. In this paper, we propose Detect FD a novel deep neural network, called Detect, Pick, IFN IFN Keyframe and Retrieval Network (DPRNet), to break the gap Detect … Similarity … between fashion products from videos and audi- … … ences. For the video side, we have modified the tra- Same/Different ditional object detector, which automatically picks out the best object proposals for every commod- Figure 1: (a) shows the previous video-to-shop pipeline, which ity in videos without duplication, to promote the manually extracts frames containing the corresponding clothing in performance of the video-to-shop task. For the videos as the clothing trajectories. Then, (a) doing DT (detection and tracking) on clothing trajectories. Finally, (a) formulating re- fashion retrieval side, a simple but effective multi- trieval as a multiple-to-single matching problem after feature con- task loss network obtains new state-of-the-art re- struction using IFN (image feature network) and LSTM.