Hi,
I’m encountering an issue with the filter_for_prokaryotes argument in the ko2kegg_abundance() function and would like to understand what is happening.
When I run the function with filter_for_prokaryotes = TRUE and FALSE, I obtain the same number of pathways in both cases (550 pathways). Pathways classified under Human Diseases are still present even when filter_for_prokaryotes = TRUE, whereas I expected them to be removed. I am using PICRUSt version 2.5.13. Do you have any idea why the filtering does not seem to have any effect?
I also have a couple of questions regarding databases and annotation of picrust2 output files:
Why does ko2kegg_abundance function return kos pathways that are subsequently not annotated during the pathway_annotation step? For example, I obtain the following warning message: WARN: KO ID ko99980 not found in KEGG database (HTTP 404)
Do these two functions rely on different versions of the KEGG database? I was thinking that KEGG was no longer an open-source database, so what is the actual KEGG version used by ggpicrust2? Are these functions rely on the paid KEGG database?
After pathway annotation, some pathways appear to be highly abundant in my dataset even though only a very small number of their constituent KOs are present (as visualized with KEGG Mapper), Do you know if a pathway is considered present as soon as one of its constituent KOs is detected? Is there a way to normalize pathway abundance based on the number of KOs detected in each of them?
Finally, PICRUSt2 mentions that the MetaCyc database is supposed to be an open-source alternative to KEGG, but when I try to access it this does not seem to be the case. I therefore have the same question as for KEGG: what is the actual MetaCyc version used by ggpicrust2? Is it possible that the package functions rely on the paid Metacyc database?
Sorry if these are basic questions, I am still a beginner with PICRUSt2, KEGG and Metacyc.
Thanks in advance for your help!
Best regards
Hi,
I’m encountering an issue with the filter_for_prokaryotes argument in the ko2kegg_abundance() function and would like to understand what is happening.
When I run the function with filter_for_prokaryotes = TRUE and FALSE, I obtain the same number of pathways in both cases (550 pathways). Pathways classified under Human Diseases are still present even when filter_for_prokaryotes = TRUE, whereas I expected them to be removed. I am using PICRUSt version 2.5.13. Do you have any idea why the filtering does not seem to have any effect?
I also have a couple of questions regarding databases and annotation of picrust2 output files:
Why does ko2kegg_abundance function return kos pathways that are subsequently not annotated during the pathway_annotation step? For example, I obtain the following warning message: WARN: KO ID ko99980 not found in KEGG database (HTTP 404)
Do these two functions rely on different versions of the KEGG database? I was thinking that KEGG was no longer an open-source database, so what is the actual KEGG version used by ggpicrust2? Are these functions rely on the paid KEGG database?
After pathway annotation, some pathways appear to be highly abundant in my dataset even though only a very small number of their constituent KOs are present (as visualized with KEGG Mapper), Do you know if a pathway is considered present as soon as one of its constituent KOs is detected? Is there a way to normalize pathway abundance based on the number of KOs detected in each of them?
Finally, PICRUSt2 mentions that the MetaCyc database is supposed to be an open-source alternative to KEGG, but when I try to access it this does not seem to be the case. I therefore have the same question as for KEGG: what is the actual MetaCyc version used by ggpicrust2? Is it possible that the package functions rely on the paid Metacyc database?
Sorry if these are basic questions, I am still a beginner with PICRUSt2, KEGG and Metacyc.
Thanks in advance for your help!
Best regards