Dear author,
I believe I've found a point of inconsistency between the paper and the provided code, and I would be grateful if you could clarify it for me.
The paper states that for the Visual Knowledge Distillation loss, you employ Relational Knowledge Distillation (RKD). However, upon reviewing the code, I was unable to find an implementation of RKD for this part. Instead, the corresponding line of code appears to calculate the loss using a standard direct feature vector matching method.
Could you please explain this discrepancy?

Dear author,
I believe I've found a point of inconsistency between the paper and the provided code, and I would be grateful if you could clarify it for me.
The paper states that for the Visual Knowledge Distillation loss, you employ Relational Knowledge Distillation (RKD). However, upon reviewing the code, I was unable to find an implementation of RKD for this part. Instead, the corresponding line of code appears to calculate the loss using a standard direct feature vector matching method.
Could you please explain this discrepancy?