Skip to content

AI4ChemS/MOF_ChemUnity

Repository files navigation

MOF-ChemUnity

MOF-ChemUnity Logo

Knowledge graph database containing computational and experimental information for more than 15,000 metal-organic frameworks developed using large language models. For more details, please refer to our paper

Installation

To keep your local Python environment clean, it is recommended that you create a new environment before installing this package. You can use virtualenv to create the environment then activate it.

virtualenv chemunity_env
source chemunity_env/bin/activate

Prerequisites

MOF-ChemUnity uses Neo4J as the graph database engine. Installation instructions can be found here. If interested in using python, please install the Neo4J python driver using

pip install neo4j

Once you have activate the virtual environment, you can install MOF-ChemUnity agents using the following commands.

First, clone the github repository to a local folder

git clone https://github.com/AI4ChemS/MOF_ChemUnity.git
cd MOF_ChemUnity

Then, you need to upgrade the build tools and build the package wheel (whl)

python -m pip install --upgrade pip setuptools wheel build

python -m build 

finally, the package can be installed using the following command

pip install dist/*.whl 

alternatively, you can specify which wheel file you want to be installed.

pip install dist/your_mof_chemunity_wheel_file_name.whl

Whenever you intend to use the classes and functions in this package, ensure that you have your virtual environment in which you have installed this package activated!

Usage

Most users who are interested in querying the knowledge graph can simply import MOF-ChemUnity to their local Neo4J engine then use the QueryAgent. In this repository, you can find sample data in this folder as CSV files. Additionally, you can use neo4j_import.py to help import MOF-ChemUnity to your local instance of Neo4J.

You need to have the following CSV files (Note - all files contain CSD reference code and DOI corresponding to the reference paper for each row):

CSV File Name Description
matching.csv Contains the MOF name associated with each CSD reference code from a given paper (DOI)
filtered_experimental_properties.csv Contains the properties extracted from literature
computational_properties.csv Contains computational labels and properties from CSD, CoRE MOF 2019 and QMOF
filtered_applications.csv Contains the applications for associated with each CSD reference code extracted from literature
synthesis.csv Contains the synthesis protocols for each MOF extracted from a given paper
water_stability.csv Contains the water stability for MOFs extracted from literature
descriptors.csv Computed geometric descriptors and revised autocorrelations (RACs)
all_props.csv Pre-filter and pre-standardization result for properties extraction
applications.csv Pre-filter and pre-standardization result for application extraction

In your script, you can first set the environment variables to access Neo4J:

import os

os.environ["NEO4J_URI"] = "your neo4j uri"
os.environ["NEO4J_USER"] = "your neo4j user name (should have read/write access)"
os.environ["NEO4J_PASSWORD"] = "your neo4j password"

Then you can import the data from CSV files into Neo4J using the neo4j_import.py script file. Simply run the script file and the same folder as the matching and extraction notebooks:

python3 neo4j_import.py

Once this is done, you should not run the previous steps again unless you have new data from new CSV files that you want to add to the knowledge graph. At this point, you have a sample of MOF-ChemUnity available for you to use.

Quick note on using this repository

The basic class used in this work is BaseAgent from which other agents may inherit specific functions. Each operation uses a specific agent and you will find an agent that performs any of the tasks presented in the work. Base Agent also allows more advanced users to create their own agents that perform customized tasks such as extracting other information from literature. More on that later.

Querying the knowledge graph using QueryAgent

The easiest way to query the MOF-ChemUnity is to use the QueryGenerationAgent. You can find notebooks with demos in this folder which query MOF-ChemUnity for various tasks including simply retrieval (task 4), prediction (task 1), inference (task 2) and recommendation (task 3). Using MOF-ChemUnity, it has been demonstrated that the query results are massively better in terms of quality (Q) and trustworthiness (T) for all of the tasks.

GPTComparison

With that, to query MOF-ChemUnity all you need is the following 3 lines of code:

from MOF_ChemUnity.Agents.QueryAgent import QueryGenerationAgent

# Connects to graph using environment variables (NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD)
agent = QueryGenerationAgent()

# Return a Pandas DataFrame
query_result = agent.run_full_query("your question about MOFs in MOF-ChemUnity")

Main

MOF-ChemUnity is a knowledge graph which combines the available computational information for MOFs and the experimental information within literature in a single database. Computational information includes computed geometric descriptors (using Zeo++), revised autocorelations (RACs), and available gas uptake labels and electronic properties from CoRE-MOF and QMOF, respectively. The experimental information was extracted automatically using Large Language Models (LLMs), specifically, GPT-4o. Extracted information includes MOF names, properties, applications, and synthesis protocols mentioned in the literature. Finally, all data is linked together using CSD reference codes. In other words, you can find MOFs using their name or their CSD reference code as this database features a one-to-one link between them.

What can you do with MOF-ChemUnity

MOF-ChemUnity enables improved querying performance when asking GPT-4o questions about MOFs (See image above). It also includes a wide variety of information that was not easily available before like applications and seemingly links this information to available computation labels via CSD reference codes. This enables finding MOFs similar to other MOFs as shown in [this demo](link to demo).

Can more information be included

Absolutely! In [this demo](link to cross-document demo) we show how the MOF names can allow users to use indexing libraries like CrossRef or Web of Science to find related papers that further explore the MOF. Then, they can extraction workflows and append the extracted data to MOF-ChemUnity. Additionally, new MOFs can also be added by running the matching notebook then the extraction notebook.

Citation

@article{pruyn2025mof,
  title={MOF-ChemUnity: Literature-Informed Large Language Models for Metal--Organic Framework Research},
  author={Pruyn, Thomas Michael and Aswad, Amro and Khan, Sartaaj Takrim and Huang, Ju and Black, Robert and Moosavi, Seyed Mohamad},
  journal={Journal of the American Chemical Society},
  year={2025},
  publisher={ACS Publications},
  doi={https://doi.org/10.1021/jacs.5c11789}
}

Privacy

We do not collect any user information in this work. You must use your own OpenAI API key and we do not store nor see any inputs or outputs.

Full Data Availability

For copyright reasons, the complete dataset is not hosted. Please contact the project supervisor for the full data and project collaborations.

License

This repository uses a dual-license model:

🚫 Commercial use is not permitted without prior written consent.

Contact

For questions, collaborations, or commercial licensing, reach out to:

📧 [mohamad.moosavi@utoronto.ca]

About

A knowledge graph unifying computational and experimental data for MOFs

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors