Thanks for the great work and dataset!
I have a question regarding the Elo rating process mentioned in the paper.
-
Comparison Scope:
When collecting pairwise judgments, were the comparisons made across different reference images (e.g., comparing a distorted A0001 vs. a distorted A0002), or were they strictly restricted to distortions within the same reference image (e.g., comparing A0001_00_00 vs. A0001_01_01)?
-
Comparability of MOS:
If the comparisons were restricted within the same reference group, does this mean the absolute MOS values are not comparable across different reference images?
For example:
A0001_00_00.bmp: MOS ~1520
A0002_00_00.bmp: MOS ~1466
Can we infer that A0001_00_00 has better perceptual quality than A0002_00_00? Or are these scores only valid as relative rankings within their respective reference groups (A0001_00_00 vs A0001_01_01)?
Clarification on this would be very helpful for my experiments. Thanks!
Thanks for the great work and dataset!
I have a question regarding the Elo rating process mentioned in the paper.
Comparison Scope:
When collecting pairwise judgments, were the comparisons made across different reference images (e.g., comparing a distorted A0001 vs. a distorted A0002), or were they strictly restricted to distortions within the same reference image (e.g., comparing A0001_00_00 vs. A0001_01_01)?
Comparability of MOS:
If the comparisons were restricted within the same reference group, does this mean the absolute MOS values are not comparable across different reference images?
For example:
A0001_00_00.bmp: MOS ~1520
A0002_00_00.bmp: MOS ~1466
Can we infer that A0001_00_00 has better perceptual quality than A0002_00_00? Or are these scores only valid as relative rankings within their respective reference groups (A0001_00_00 vs A0001_01_01)?
Clarification on this would be very helpful for my experiments. Thanks!