Hi,
in the scoring formula on page 7 in the paper, shouldn't the KL divergence of the classifier prediction from uniform be small for OOD inputs, and the rotation CE be large on OOD since the rotation head has not been trained to predict the original rotation on OOD inputs? I.e. one of the terms should have a minus sign, right?
If I read it correctly, the code uses different signs for those terms:
|
classification_loss = -1 * kl_div(class_uniform_dist, classification_smax) |
|
|
|
rot_one_hot = torch.zeros_like(rot_smax).scatter_(1, target_rots.unsqueeze(1).cuda(), 1) |
|
rot_loss = kl_div(rot_one_hot, rot_smax) |
where KL is positive CE minus the constant entropy of U.
Hi,
in the scoring formula on page 7 in the paper, shouldn't the KL divergence of the classifier prediction from uniform be small for OOD inputs, and the rotation CE be large on OOD since the rotation head has not been trained to predict the original rotation on OOD inputs? I.e. one of the terms should have a minus sign, right?
If I read it correctly, the code uses different signs for those terms:
ss-ood/multiclass_ood/test_auxiliary_ood.py
Lines 182 to 185 in 2a284be
where KL is positive CE minus the constant entropy of U.