Skip to content

Is it possible to also release the prompt used for each protein in protein_catalogue that used to generate the generation? #5

@qwu01

Description

@qwu01

Hey,
Congrats to the cool BioReason-Pro!
I was trying to obtain the SFT and RL generated texts for a subset of proteins (yeast to be specific) by running the predict.py, but the texts (all three sections: think, summary, and terms (interpro, GOBP/CC/MF terms) ) are different from what's released on hf-datasets protein_catalogue (terms are mostly the same, summary remains sematically similar, but is different).

Because the predicted go terms (go_pred) are released in huggingface training and testing dataset, I just used the released go_pred and bypassed the GO-GPT step. I only did GO-GPT for proteins that do not have go_pred data. This might be where the discrepancy from?
So I think it'd be best for reproducibility reasons that the prompt used for each protein could be released alongside the generation field of protein_catalogue.

Qi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions