Virtual Research Environments (VREs) are essential for modern data-driven research, yet the manual overhead of configuring these environments remains a significant barrier for researchers in the Social Sciences and Humanities (SSH). The idea is to provide a framework to automate VRE setup by leveraging Agentic AI for metadata generation. By populating a Knowledge Graph with CodeMeta descriptions generated automatically from software repositories, we enable a machine-readable pipeline that bridges the gap between raw data access and functional, containerized research environments.
- Which tools and versions can manipulate my specific data?
- Are the software licenses compatible with my research stack and institutional requirements?
- What are the specific operating system, memory, and CPU requirements?
- Where is the documentation, and how should packages be sequenced during installation?
- Is there a citable publication or an ORCID associated with the authors?
- Automatic configuration of SANE VMs using information from the Knowledge Graph.
Install Multipass to create your VMs locally (Experiments with Macbook):
brew install --cask multipass
Install required python libraries:
pip install -r requirements.txt
or individual:
pip install pyyamlpip install requestspip install flaskpip install rdflib
Execution
python extrac_run.py
Web interface
python web4.py
extrac_run.py creates a ansible script deploy.yml file and runs a VM configured by it.
One of the ideas is to use this ansible script file to create a vm in other environments; there's no need to be multipass on a MacBook or another operational system.
The script leverages macOS's Hypervisor via Multipass. Here is a high-level look at how the layers interact:
- YAML Parsing: The script uses PyYAML to turn your human-readable settings into a Python dictionary.
- Resource Allocation: It maps the CPUs, memory, and disk keys directly to the command-line arguments that Multipass requires.
- Cloud-Init: If you want the VM to come pre-installed with software (like Git or Docker), the cloud_init section handles that automatically upon the first boot.
- Native Performance: Because it uses the Virtualization Framework (on Apple Silicon or Intel), there is very little overhead compared to heavier tools like VirtualBox.
Once your script has started the VM, you can manage it from your terminal. Here are some useful commands:
- Enter the VM
multipass shell build-box - Check Status
multipass list - Stop the VM
multipass stop build-box - Delete the VM
multipass delete --purge build-box