Thank you very much for your great work in preparing the datasets and benchmarking the models.
I have the following two questions.
-
In Table 3 of the FinBen paper, there are some cases where the best value are highlighted in bold. However, there are better or equal values in other columns. Could there be typos in the values or errors in highlighting them in bold?
- For example, on
German dataset, the highest MCC value is 0.05, but 0.03 is highlighted in Bold. On taiwan dataset, the highest MCC value is 0.00, but -0.01 is boldface. On ccf dataste, there are multiple 0.00 values, but only one of them is boldface. On polish dataset, there are multiple 0.92 F1 values, but only one of them is boldface.
-
In the FinBen paper (Table 3), the "MCC" values on the following datasets are very low (close to zero): German, LendingClub, ccf, ccfraud, polish, taiwan, portooseguro, travelinsurance. But, in the Open-FinLLMs paper (Table 5), the MCC values have been significantly increased (even for pre-trained Llama models). Have you modified these datasets in the Open-FinLLMs paper, or are you using a different metric?
Thank you very much for your great work in preparing the datasets and benchmarking the models.
I have the following two questions.
In Table 3 of the FinBen paper, there are some cases where the best value are highlighted in bold. However, there are better or equal values in other columns. Could there be typos in the values or errors in highlighting them in bold?
Germandataset, the highest MCC value is 0.05, but 0.03 is highlighted in Bold. Ontaiwandataset, the highest MCC value is 0.00, but -0.01 is boldface. Onccfdataste, there are multiple 0.00 values, but only one of them is boldface. Onpolishdataset, there are multiple 0.92 F1 values, but only one of them is boldface.In the FinBen paper (Table 3), the "MCC" values on the following datasets are very low (close to zero): German, LendingClub, ccf, ccfraud, polish, taiwan, portooseguro, travelinsurance. But, in the Open-FinLLMs paper (Table 5), the MCC values have been significantly increased (even for pre-trained Llama models). Have you modified these datasets in the Open-FinLLMs paper, or are you using a different metric?