Great work! I have a quick question regarding the token definition in Table 1 to ensure a fair comparison:
1. Your method: Your method: Does the token count (64/128/192) refer to the average tokens propagated per layer, or strictly the keep_tokens after the pruning layer? (e.g., prune_layer=3, keep_tokens=192 means a full average of ~216 tokens across LLaVA-1.5)
2. Baselines: For baselines like FastV, PDrop, and SparseVLM, their reduction mechanisms differ. For instance, SparseVLM achieves its 1505 MME score with an average of 64 tokens per layer via progressive pruning (layers [2, 6, 15] with budgets [66, 30, 17]). Were these baselines aligned based on the average layer-wise token cost or their respective configuration budgets?
Looking forward to your clarification. Thanks!
Great work! I have a quick question regarding the token definition in Table 1 to ensure a fair comparison:
1. Your method: Your method: Does the token count (64/128/192) refer to the average tokens propagated per layer, or strictly the keep_tokens after the pruning layer? (e.g., prune_layer=3, keep_tokens=192 means a full average of ~216 tokens across LLaVA-1.5)
2. Baselines: For baselines like FastV, PDrop, and SparseVLM, their reduction mechanisms differ. For instance, SparseVLM achieves its 1505 MME score with an average of 64 tokens per layer via progressive pruning (layers [2, 6, 15] with budgets [66, 30, 17]). Were these baselines aligned based on the average layer-wise token cost or their respective configuration budgets?
Looking forward to your clarification. Thanks!