Multi-Agent MATE

We introduce MATE, a multimodal accessibility multi-agent framework, which performs various modality conversions tasks based on the user needs. The system is useful for assisting people with disabilities by ensuring that data will be converted to a format users can understand. For example, if the user cannot hear well and receives an audio, the system converts this audio into a video with subtitles. MATE can be applied to many domains, such as healthcare, and can become a useful assistant for various groups of users.

Model

In addition to MATE, we introduce ModCon-Task-Identifier, a fine-tuned BERT model designed for recognizing the modality conversion task type based on the user prompt. Numerous experiments show that the model significantly outperforms other existing LLMs, as well as machine learning classifiers (e.g., logistic regression, CatBoost, etc.)

Since the model is relatively big (~450 MB), it could not be uploaded to GitHub. Hence, the model was publicly released on Huggingface. You can find at https://huggingface.co/AleksandrAlgazinov/ModCon-Task-Identifier.

Installation

After cloning or downloading this repository, you need to setup the constants.py file.
This file contains all environment variables that are used in this project and that you can customize to fit your needs.

At the root folder, you will find a constants_template.py. Copy it or rename as constants.py. A good option to get started is to use our agent system based on the glm-4-flash model, since it is free to API calls. You need to set your api key that is obtained after logging in here.

Usage

To execute the framework, you can either use the Jupyter Notebook file or the Python script.

Jupyter Notebook

After launching the Jupyter Notebook (In VS Code, press the "Run All" button on top), an input box will appear, asking you to enter your desired conversion.
The detected modality conversion will be printed so you can check if is correct. If it is not, restart the code and enter another prompt.\

After entering a prompt, follow the printed instructions to indicate the source input to convert.
The program will print the output path after the agent finished.

Python script

Altough the pipeline is the same in the python script, it is executed differently.
In your terminal execute the following command :

python3 run_agents.py

Acronyms meaning

Acronym	Associated Modality Conversion
STT	Speech to Text
VTT	Video to Text
TTS	Text to Speech
ITT	Image to Text
ITA	Image to Audio
TTI	Text to Image
ATI	Audio to Image
TTV	Text to Video
ATV	Audio to Video
UNK	Unknown conversion

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CLASSIFIER		CLASSIFIER
ModConTT		ModConTT
evaluation_llms_graphs		evaluation_llms_graphs
src		src
.gitignore		.gitignore
Evaluate_Choose_Best_LLM.ipynb		Evaluate_Choose_Best_LLM.ipynb
LICENSE		LICENSE
README.md		README.md
constants_template.py		constants_template.py
evaluation_data_combined.csv		evaluation_data_combined.csv
evaluation_data_combined_with_predictions.csv		evaluation_data_combined_with_predictions.csv
run_agents.ipynb		run_agents.ipynb
run_agents.py		run_agents.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent MATE

Model

Installation

Usage

Jupyter Notebook

Python script

Acronyms meaning

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent MATE

Model

Installation

Usage

Jupyter Notebook

Python script

Acronyms meaning

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages