This guide walks you through installing and running Cape. We tested the current setup on a Mac equipped with an Apple M1 Pro chip (10-core CPU), 32 GB of RAM, and a 1 TB SSD.
There are essentially three main stages for fully deploying Cape:
- Configuring the OpenWebUI-based Frontend (step 1/3) - by this point you'll not be able to interface yet locally with Gemma3n
- Configuring the UI OpenWebUI Backend and Ollama (step 2/3) - by this point you'll be able to interface locally with Gemma3n (text only)
- Configuring a custom inference server that hosts Gemma3n in full precision via HuggingFace (step 3/3) - at this point Cape will be able to use Gemma3n locally to process images and videos
Make sure the following tools are installed on your system:
- Conda
- Node.js + npm (v16 or higher)
- ollama
venv
git clone https://github.com/spetrescu/cape.gitcd cape/src/ui/open-webuicp -RPp .env.example .envInstall required packages (may need --force to resolve version issues)
npm install --forceStart the frontend development server:
npm run devYou should see the frontend running at http://localhost:5173/.
cd backendCreate and activate the Python environment (Python 3.11 required):
conda create --name open-webui python=3.11
conda activate open-webuiEnsure ollama is installed and running:
ollama serveIf you see this message: Error: listen tcp 127.0.0.1:11434: bind: address already in use, it means Ollama is already running in the background, which is exactly what you want and there is no need to run ollama serve again. If you want to double check that the port is indeed occupied by ollama, run curl http://localhost:11434 which should return Ollama is running.
Now, in another terminal, pull Gemma3n:e2b (5.6 GB download):
ollama pull gemma3n:e2bIf you have capacity on your machine you can also pull the bigger Gemma3n model, namely by running ollama pull gemma3n:e4b (7.5 GB download).
From the backend directory:
pip install -r requirements.txt -UMake sure the Python version in use is 3.11 (check with python --version).
sh dev.shNavigate to http://localhost:5173/ to see the application running. At this stage you should be able to see the following image (after you have created a user and a password):
3. Configuring a custom inference server that hosts Gemma3n in full precision via HuggingFace (step 3/3)
The goal here is to host the Gemma 3n model locally via Hugging Face's transformers library (with proper authentication and access rights) in full precision using Flask API. The purpose here is to use the multimodal functionality, as Ollama does not support it, at least as of 7 Aug 2025.
- Make sure you have a working Hugging Face token (in order to download models), and that also you have applied to get access to Gemma3n models (link here: https://huggingface.co/google/gemma-3n-E2B-it). Click "Access repository" to request permission from Google (required even for open weights).
cd cape/src/gemma3n-inference-serverpython3 -m venv cape-gemma3n-inferencesource cape-gemma3n-inference/bin/activatepip install -r requirements.txtThis is required to download the Gemma 3n model. Assuming you have been succesffully been given access to the model (see beggining of the section), log in to HugginFace from the terminal:
huggingface-cli loginAnd paste your token from (https://huggingface.co/settings/tokens).
python gemma3n_inference_server.pyOnce the server is running, you can verify it’s working by using the health_check_server.py script.
- Open a new terminal window/tab and run:
cd cape/src/gemma3n-inference-server- Activate the same virtual environment
source cape-gemma3n-inference/bin/activate- Run the health check script
python health_check_server.pyIf everything is working correctly, the response should look like this:
Response: {
'response': "Here's a brief description of each attached image, focusing on aspects relevant for potential debugging and system diagnosis:\n\n**Image 1: Network Equipment**\n\nThis image shows two pieces of network equipment connected via cables.\n\n* **Left Unit (Black Box):** This appears to be a network switch or router. Key observations:\n * **Ethernet Ports:** Multiple yellow Ethernet cables are connected to its ports, indicating network connectivity.\n * **USB Port:** A USB port is visible, which could be for connecting peripherals or for certain functionalities.\n * **Power Cable:** A black power cable is plugged into the unit.\n * **Labels:** There are labels indicating \"LAN,\" \"CABLE,\" and \"POWER.\" \n\n* **Right Unit (White Box):** This is a device with several Ethernet ports and LEDs. Key observations:\n * **Ethernet Ports:** Multiple yellow Ethernet cables are connected to its ports.\n * **LED Indicators:** Several LEDs are lit up, which are crucial for monitoring the device's status (link status, activity, etc.). The specific meaning of the LEDs might be found in the device's manual.\n * **Power Cable:** A black power cable is plugged into the..."
}By this point, you should have the following processes running: the OpenWebUI frontend, ollama running, the OpenWebUI backend running, and the gemma3n_inference_server. If so, you should now be able to interact with Cape with all the implemented functionality to date - navigate to http://localhost:5173/.
