Hi, and thanks for the great work on this project!
I'm currently working with the training code and noticed something potentially inconsistent. While the documentation and flags suggest support for --text_encoder_type bert, it looks like the dataset is still loading GloVe embeddings via this line:
self.w_vectorizer = WordVectorizer(pjoin(opt.cache_dir, 'glove'), 'our_vab')
This occurs in:
data_loaders/humanml/data/dataset.py
This raises a few questions:
Is BERT actually used anywhere in the dataset loading or preprocessing pipeline?
If BERT is supported, where is it being applied?
Is there a separate dataset class or flow for BERT-based encoding?
I’d love clarification so I can ensure the correct embeddings are used during training.
Hi, and thanks for the great work on this project!
I'm currently working with the training code and noticed something potentially inconsistent. While the documentation and flags suggest support for --text_encoder_type bert, it looks like the dataset is still loading GloVe embeddings via this line:
This occurs in:
This raises a few questions:
Is BERT actually used anywhere in the dataset loading or preprocessing pipeline?
If BERT is supported, where is it being applied?
Is there a separate dataset class or flow for BERT-based encoding?
I’d love clarification so I can ensure the correct embeddings are used during training.