Present quality assessment (QA) algorithms aim to generate scores for natural images consistent with subjective
scores for the quality assessment task. For the quality assessment task, human observers evaluate a natural
image based on its perceptual resemblance to a reference. Natural images communicate useful information to
humans, and this paper investigates the utility assessment task, where human observers evaluate the usefulness of
a natural image as a surrogate for a reference. Current QA algorithms implicitly assess utility insofar as an image
that exhibits strong perceptual resemblance to a reference is also of high utility. However, a perceived quality
score is not a proxy for a perceived utility score: a decrease in perceived quality may not affect the perceived
utility. Two experiments are conducted to investigate the relationship between the quality assessment and utility
assessment tasks. The results from these experiments provide evidence that any algorithm optimized to predict
perceived quality scores cannot immediately predict perceived utility scores. Several QA algorithms are evaluated
in terms of their ability to predict subjective scores for the quality and utility assessment tasks. Among the QA
algorithms evaluated, the visual information fidelity (VIF) criterion, which is frequently reported to provide the
highest correlation with perceived quality, predicted both perceived quality and utility scores reasonably. The
consistent performance of VIF for both the tasks raised suspicions in light of the evidence from the psychophysical
experiments. A thorough analysis of VIF revealed that it artificially emphasizes evaluations at finer image scales
(i.e., higher spatial frequencies) over those at coarser image scales (i.e., lower spatial frequencies). A modified
implementation of VIF, denoted VIF*, is presented that provides statistically significant improvement over VIF
for the quality assessment task and statistically worse performance for the utility assessment task. A novel utility
assessment algorithm, referred to as the natural image contour evaluation (NICE), is introduced that conducts a
comparison of the contours of a test image to those of a reference image across multiple image scales to score the
test image. NICE demonstrates a viable departure from traditional QA algorithms that incorporate energy-based
approaches and is capable of predicting perceived utility scores.
|