Abstract

End-to-end learned image and video compression is a fast growing research area. There have been more than 100+ publications in the literature in the last few years, with the state-of-the-art end-to-end learned image compression showing comparable compression performance to H.266/Versatile Video Coding (VVC) intra coding in terms of Peak-Signal-to-Noise-Ratio-RGB (PSNR-RGB) and much better Multi-Scale Structural Similarity (MS-SSIM) results. End-to-end learned video coding is also catching up quickly. Some preliminary studies report comparable PSNR-RGB results to H.265/High-Efficiency Video Coding (HEVC) or even H.266/VVC under the low-delay setting. These interesting results have led to intense activity in international standards organizations, e.g. JPEG AI and various challenges, e.g. Challenge on Learned Image Compression (CLIC) at CVPR and Grand Challenge on Neural Network-based Video Coding at the IEEE International Symposium on Circuits and Systems (ISCAS).

This tutorial shall (1) summarize the progress of this topic in the past three or so years, including an overview of recent standardization activities in JPEG AI and MPEG VCM, (2) introduce the basics of learned image compression that builds upon variational autoencoders and/or flow models. We will also look into the complexity aspects of learned image compression systems and explore some recent low-complexity algorithms and architectures. In the third part, we shall (3) explore an emerging school of thought for learned video compression that leverages conditional generative models for more efficient inter-frame coding. Lastly, (4) we shall look at the application of end-to-end learned image/video compression to computer vision tasks, an emerging research area also known as image and video coding for machines.

Outline

Time (UTC+2) Title
09:00 - 09:20 Overview of Learned Image and Video Compression (Wen-Hsiao Peng)
  1. Introduction to end-to-end learned image and video compression
  2. Rate-distortion performance of learned image and video compression
  3. Recent developments in CLIC, JPEG AI, and MPEG video coding for machines
09:20 - 10:45 End-to-End Learned Image Compression (Heming Sun)
  1. Notable Systems
  2. Fast Entropy Coding Methods
  3. Network Quantization for Learned Image Compression
  4. Implicit Neural Representation for Image Compression
  5. Real-time implementation
10:45 - 11:00 Coffee Break
11:00 - 12:10 End-to-End Learned Video Compression (Wen-Hsiao Peng)
  1. Elements of end-to-end learned video compression
  2. Learned video compression with residual-based inter-frame coding
  3. Learned video compression with conditional inter-frame coding
  4. Complexity analysis of learned video compression
12:10 - 12:40 Learned Image and Video Compression for Machines (Wen-Hsiao Peng)
  1. Single-task, multi-task, and scalable bitstreams
  2. Review of a few notable systems
  3. Transfer learning from human perception to machine perception
12:40 - 12:50 Concluding Remarks

Demos

TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception (ICCV 2023)

Project Page


F-LIC: FPGA-based Learned Image Compression with a Fine-grained Pipeline (A-SSCC 2022)

720p@30fps real-time learned codec on FPGA

Speakers

Wen-Hsiao Peng

National Yang Ming Chiao Tung University, Taiwan

Website Email

Dr. Wen-Hsiao Peng (M’09-SM’13) received his Ph.D. degree from National Chiao Tung University (NCTU), Taiwan, in 2005. He was with the Intel Microprocessor Research Laboratory, USA, from 2000 to 2001. Since 2003, he has actively participated in the ISO/IEC and ITU-T video coding standardization process and contributed to the development of SVC, HEVC, and SCC standards. He was a Visiting Scholar with the IBM Thomas J. Watson Research Center, USA, from 2015 to 2016. He has authored over 95 journal/conference papers and over 60 ISO/IEC and ITU-T standards contributions. Dr. Peng was Chair of the IEEE Circuits and Systems Society (CASS) Visual Signal Processing (VSPC) Technical Committee from 2020-2022. He was Technical Program Co-chair for 2021 IEEE VCIP, 2011 IEEE VCIP, 2017 IEEE ISPACS, and 2018 APSIPA ASC; Publication Chair for 2019 IEEE ICIP; Area Chair/Session Chair/Tutorial Speaker/Special Session Organizer for IEEE ICME, IEEE VCIP, and APSIPA ASC; and Track/Session Chair and Review Committee Member for IEEE ISCAS. He served as AEiC for Digital Communications for IEEE JETCAS and Associate Editor for IEEE TCSVT. He was Lead Guest Editor, Guest Editor and SEB Member for IEEE JETCAS, and Guest Editor for IEEE TCAS-II. He was Distinguished Lecturer of APSIPA and the IEEE CASS. Dr. Peng is also a Fellow of the Higher Education Academy (FHEA).


Heming Sun

Yokohama National University, Japan

Website Email

Dr. Heming Sun received the B.E. degree in electronic engineering from Shanghai Jiao Tong University, Shanghai, China, in 2011, and received the M.E. degree from Waseda University and Shanghai Jiao Tong University, in 2012 and 2014, respectively, through a double-degree program. In 2017 he earned his Ph.D. degree from Waseda University through the embodiment informatics program supported by Ministry of Education, Culture, Sports, Science and Technology (MEXT). He was a researcher at NEC Central Research Laboratories from 2017 to 2018. He was an assistant professor at Waseda University, Japan, during 2018 to 2023. He is now associate professor at Yokohama National University, Japan. He was selected as Japan Science and Technology Agency (JST) PRESTO Researcher, during 2019 to 2023. His interests are in algorithms and VLSI architectures for image/video processing and neural networks. He participated in the 8K HEVC decoder chip design, which won the ISSCC 2016 Takuo Sugano Award for Outstanding Far-East Paper. He also got several awards including the Best Paper Award of VCIP 2020, Top-10 Best Paper of PCS 2021, and IEEE Computer Society Japan Chapter Young Author Award 2021. Regarding the academic achievements and activities, he has published over 80 peer-reviewed journal and conference papers (e.g. TMM, JSSC, TCAS-I, ISSCC, CVPR, VCIP, ISCAS). He held a special session on "Neural Network Technology in Future Image/Video Coding" at Picture Coding Symposium (PCS) 2019 and co-organize the special session on “Towards Practical Learning-based Image and Video Coding” at PCS 2022. He is invited to give a talk about “Deep Learning Method for Image Compression” by Information Processing Society of Japan. He also served as reviewers for many flagship CAS-society journals such as TCSVT, TCAS-I, TCAS-II.