Hello and thanks for the fantastic work on FAMO!
I've been experimenting with FAMO in some personal projects and have observed an interesting behavior related to loss values. It appears that when individual losses are less than 1, the FAMO loss becomes negative, and when they are greater than 1, the FAMO loss is positive. This seems to stem from the logarithmic operation in get_weighted_loss.
Interestingly, when all individual losses are uniformly less than 1 or greater than 1, the aggregated FAMO loss seems to converge towards zero. I'm not entirely clear on the mechanics behind this behavior.
I am curious about how FAMO is expected to perform in scenarios where some losses exceed 1 while others do not. Additionally, I've encountered issues when attempting to implement learning rate (LR) scheduling under these conditions. It seems to fail.
Could you provide any insights or recommendations for handling such cases? Maybe there's a different approach or adjustment to the LR scheduling that might help?
Hello and thanks for the fantastic work on FAMO!
I've been experimenting with FAMO in some personal projects and have observed an interesting behavior related to loss values. It appears that when individual losses are less than 1, the FAMO loss becomes negative, and when they are greater than 1, the FAMO loss is positive. This seems to stem from the logarithmic operation in get_weighted_loss.
Interestingly, when all individual losses are uniformly less than 1 or greater than 1, the aggregated FAMO loss seems to converge towards zero. I'm not entirely clear on the mechanics behind this behavior.
I am curious about how FAMO is expected to perform in scenarios where some losses exceed 1 while others do not. Additionally, I've encountered issues when attempting to implement learning rate (LR) scheduling under these conditions. It seems to fail.
Could you provide any insights or recommendations for handling such cases? Maybe there's a different approach or adjustment to the LR scheduling that might help?