MUSIQ: Multi-scale Image Quality Transformer
MUSIQ:Transformer多尺度图像质量评估
paper:https://arxiv.org/abs/2108.05997
code:google-research/musiq at master · google-research/google-research · GitHub
本文提出了一种多尺度图像质量Transformer:MUSIQ,它可以处理具有不同分辨率、尺寸和宽高比的全尺寸图像输入,可以捕获不同粒度的图像质量,在多个大规模 IQA数据集上表现SOTA!
Figure 1. In CNN-based models (b), images need to be resized or cropped to a fixed shape for batch training. However, such preprocessing can alter image aspect ratio and composition, thus impacting image quality. Our patch-based MUSIQ model (a) can process the full-size image and extract multi-scale features, which aligns with the human visual system.
Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ [43], SPAQ [12], and KonIQ10k [17].