Perceptual Video Quality Assessment and Analysis Using Adaptive Content Dynamics
Total Page:16
File Type:pdf, Size:1020Kb
PERCEPTUAL VIDEO QUALITY ASSESSMENT AND ANALYSIS USING ADAPTIVE CONTENT DYNAMICS A Dissertation Presented to The Academic Faculty by Mohammed A. Aabed In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Electrical and Computer Engineering Georgia Institute of Technology May 2017 Copyright c 2017 by Mohammed A. Aabed PERCEPTUAL VIDEO QUALITY ASSESSMENT AND ANALYSIS USING ADAPTIVE CONTENT DYNAMICS Approved by: Professor Ghassan I. AlRegib, Advisor Professor Elliot Moore II School of Electrical and Computer School of Electrical and Computer Engineering Engineering Georgia Institute of Technology Georgia Institute of Technology Professor Anthony J. Yezzi Professor Berdinus A. Bras School of Electrical and Computer The George W. Woodruff School of Engineering Mechanical Engineering Georgia Institute of Technology Georgia Institute of Technology Professor David V. Anderson Date Approved: December 21st, 2016 School of Electrical and Computer Engineering Georgia Institute of Technology In the memory of my mother, to my father, and to my brothers iii ACKNOWLEDGEMENTS I would like to express my utmost gratitude to my advisor, Prof. Ghassan AlRegib, a man whose guidance, patience and generosity have always made him a trusted mentor and a dear close friend. His continuous teaching, friendship, support, and understanding throughout this process have allowed me to learn, innovate, enjoy and made it a rewarding experience. I would like to thank Dr. Anthony J. Yezzi and Prof. David V. Anderson for their time, valuable comments and for agreeing to serve as reading remember of my dissertation committee. I would like to also thank Prof. Elliot Moore II and Prof. Bert Bras for their time, comments, and for agreeing to serve on my defense committee. This dissertation would not have seen the light without a great working environ- ment of the Multimedia and Sensors Lab (MSL) and Center for Signal and Information Processing (CSIP). I thank all CSIP faculty, staff, and students. I also would like to thank all my friends and colleagues within MSL and CSIP. Special thanks go to my dear friends and colleagues, Dr. Dogancan Temel, Dr. Mashhour Solh, Dr. Sami Almalfouh and Ms. Patricia Dixon for their valuable friendships, collaborations, and support over the years. My thanks go to the academic office staff at the School of Electrical and Computer Engineering at the Georgia Institute of Technology. I am forever in debt to my family for their endless care, encouragement, patience, and unconditional support. My everlasting gratitude goes to my first and foremost teacher, my mother, for her teachings, guidance, care and dedication. I am eternally grateful to my father for the continuous dedication, care, support and wise guidance. Special thanks go to my brother, Dr. Mohanad Aabed, MD, for his unconditional support and encouragement over the years. My gratitude goes also to my brothers, iv Mr. Khaled Abed and Mr. Saed Aabed, for their support and encouragement. I would like to thank all my friends who supported me through this endeavor. Special thanks go to Dr. Osama Mohamad, MD, for his valuable friendship and support over the years. \In the sweetness of friendship let there be laughter, and sharing of pleasures. For in the dew of little things the heart finds its morning and is refreshed."1 To all of you my family, teachers, friends, and colleagues, I humbly thank you. 1Khalil Gibran v TABLE OF CONTENTS ACKNOWLEDGEMENTS .......................... iv LIST OF TABLES ............................... viii LIST OF FIGURES .............................. xi SUMMARY .................................... xiii I INTRODUCTION ............................. 1 1.1 Contributions and Dissertation Organization.............5 II BACKGROUND AND PRIOR ART ................. 8 2.1 Digital Video Streaming and Compression...............8 2.1.1 Video Coding Fundamentals..................9 2.1.2 The Beginning and Early Generations of Video Codecs... 14 2.1.3 H.264/MPEG-4 Part 10 Advanced Video Coding (AVC)... 16 2.1.4 H.265/MPEG-H Part 2 High Efficiency Video Coding (HEVC) 19 2.1.5 Beyond ITU-T and MPEG: Towards Royalty-Free Video Stan- dards............................... 21 2.2 Distortions in Videos.......................... 25 2.3 Human Visual System and Visual Perception............. 28 2.4 Perceptual Video Quality Assessment................. 32 2.5 Literature Survey............................ 36 2.5.1 Perceptual VQA Fundamentals................. 36 2.5.2 VQA Metrics and Algorithms.................. 38 III PERCEPTUAL QUALITY ASSESSMENT USING OPTICAL FLOW FEATURES (PEQASO) ......................... 45 3.1 Optical Flow Preliminaries....................... 46 3.1.1 Video Quality Monitoring Using Optical Flow........ 47 3.2 Experiments and Results........................ 51 3.2.1 Video Quality Assessment Databases............. 52 vi 3.2.2 Performance Metrics and Auxiliary Formulation....... 54 3.2.3 Results and Validation..................... 55 3.2.4 Computational Complexity................... 61 3.2.5 Limitations............................ 61 IV PERCEPTUAL QUALITY ASSESSMENT VIA POWER SPEC- TRAL ANALYSIS ............................. 64 4.1 No-Reference Frame-Level Distortion Estimation........... 65 4.1.1 Experiments and Results.................... 66 4.2 Power of Tempospatially Unified Spectral Density (POTUS).... 71 4.2.1 3D Power Spectral Density................... 71 4.2.2 Perceptual Video Quality Assessment via Tempospatial Power Spectral Analysis........................ 74 4.2.3 Visual Perception in POTUS.................. 79 4.2.4 Experiments and Results.................... 81 V FEATURES TO PERCEPTION: TEMPOSPATIAL POOLING STRATE- GIES FOR DIFFERENT VISUAL DISTORTION MAPS .... 87 5.1 Tempospatial Video Pooling for PVQA................ 87 5.2 Experiments and Results........................ 89 5.2.1 Databases and Test Videos................... 89 5.2.2 Results and Analysis...................... 91 5.A Best Performing Pooling Strategies for the Tested Distortion Maps. 97 VI CONCLUSION AND FUTURE DIRECTIONS ........... 107 6.1 Future Research Directions....................... 109 REFERENCES .................................. 111 VITA ........................................ 124 vii LIST OF TABLES 1 New features introduced in HEVC compared to AVC.......... 22 2 A summary of VQA databases...................... 33 3 A summary of the main features used common and recent algorithms and metrics for VQA............................ 41 4 A summary of common and recent algorithms and metrics for VQA.. 44 5 SROCC and PLOCC for the proposed metric and a set of the most common video and images quality assessment metrics tested on se- quences with H.264 compression artifacts only.............. 57 6 SROCC and PLOCC for the proposed metric and a set of the most common video and images quality assessment metrics tested on se- quences with channel-induced distortion................. 57 7 Statistical correlation of the proposed metric for different temporal and spatial features for sequences with H.264 compression artifacts..... 58 8 Statistical correlation of the proposed metric for different temporal and spatial features for sequences with channel-induced distortion..... 58 9 Computational complexity comparison of the proposed algorithms in this work and other metrics. The numbers in the table show average computation times for 120 frames..................... 63 10 Test Video Sequences........................... 67 11 Correlation between the estimated frame distortion, Dk, and the full- reference SSIM values........................... 68 12 Normalizations parameters values for the three databases....... 80 13 SROCC for the proposed metric and a set of the most common video and images quality assessment metrics tested on all sequences in the three databases............................... 82 14 PLOCC for the proposed metric and a set of the most common video and images quality assessment metrics tested on all sequences in the three databases............................... 83 15 Spatial and temporal decomposition correlation coefficients...... 86 16 Summary of the VQA databases used in this study and their video contents................................... 91 viii 17 Number of occurrences of video pooling operations among the best three scenarios of SROCC for each distortion map. The red cells in- dicate that the operation's frequency is 35% or higher, and the blue ones indicate 20%-35% frequency. These numbers are based on the statistics in Tables 19,23, 27........................ 95 18 Number of occurrences of video pooling operations among the best three scenarios of PLOCC for each distortion map. The red cells in- dicate that the operation's frequency is 35% or higher, and the blue ones indicate 20%-35% frequency. These numbers are based on the statistics in Tables 20,24, 28........................ 95 19 The best pooling strategies for full reference squared errordistortion maps (spatial pooling, temporal pooling, SROCC, average frames per sequence).................................. 98 20 The best pooling strategies for full reference squared errordistortion maps (spatial pooling, temporal pooling, PLOCC, average frames per sequence).................................. 99 21 Spearman correlation coefficients statistical significance of the tested spatial pooling strategies using SE distortion map............ 100 22 Spearman correlation coefficients statistical significance of the tested temporal pooling strategies using SE distortion map.......... 100 23 The best pooling