This repository contains an AI agent prototype regarding capturing provenance in scientific Notebooks for better transparency.
This repository contains code to build a standalone AI agent that takes a Jupyter Notebook and automatically generates a bibliography for the data and software used in it.
It should extract content and identify imported libraries, imported data, and what data is actually being used after filtering. Data and software is clearly distinguised to help scientists credit every aspect of research.
This AI agent serves as a prototype for a larger agent that in addition to data/library identification will be able to: prompt the user if a citation cannot be found and update context appropriately, and help user deposit their own data when used in a notebook.
Although this agent serves as a prototype for future integration into PaleoPAL, it is seperate from the three main PaleoPAL agents, and will automate the tedious task of manual citation.
The research presented here is supported by NSF #2425885 and the NSF REU program.