Ollama Agent is a powerful command-line tool (CLI and REPL) that allows you to interact with local AI models. Built on DeepAgents and LangChain, it provides a persistent chat experience, session management, and the ability to execute local shell commands, turning your local models into helpful assistants for your daily tasks.
- Interactive REPL: A modern, terminal-based chat interface with Markdown rendering and slash commands.
- Non-Interactive CLI: Execute single prompts directly from your command line for quick queries.
- Native Ollama Integration: Connects directly to Ollama's native API (via
langchain-ollama), no OpenAI compatibility layer needed. - Thinking / Reasoning: Leverages Ollama's native thinking capability to expose model reasoning traces. Configurable per model via
--effort. - Automatic Context Window: Resolves the model's effective context window (
num_ctx) automatically from Ollama metadata, or allows manual override in config. - Per-session Model Switching: Change the model mid-conversation and continue from that point with the new model (context preserved). The change is not permanent and only affects the current session.
- Screen Vision (Screenshots): Attach monitor screenshots in prompts using
@dpNfor visual context. - Tool-Powered: The agent can execute shell commands via an integrated shell backend, allowing it to interact with your local environment to perform tasks.
- MCP Integration: Extend the main agent with Model Context Protocol servers (
mcp_servers.json) that provide additional tools as isolated subagents. - Custom Subagents: Define specialized subagents in
settings.yamlwith their own model, skills, and MCP servers — each with isolated context for clean delegation. - Session Management: Conversations are automatically saved and can be reloaded, deleted, or switched between.
- Task Management: Save frequently used prompts as "tasks" and execute them with a simple command.
- Configurable: Easily configure the model, Ollama host, context window, and reasoning effort.
- Persistent Memory: Native memory layer backed by
MEMORY.md, allowing the agent to persist long-term context across sessions. - RAG (Retrieval Augmented Generation): Create and manage document databases for context-aware responses using local embeddings and Qdrant.
- Skills: Extend the agent with reusable, on-demand capabilities via the Agent Skills specification. Skills provide task-specific instructions and context through progressive disclosure.
Before installing/running the app, make sure you have:
- Ollama (or compatible API) running.
- A model that supports tool calling (required). If the selected model does not support tools/function-calling, the app will exit.
- The embeddings model downloaded in Ollama. By default, RAG uses
nomic-embed-text:latest. - Vision-capable model (optional): only required if you want to use Screen Vision (
@dpN). If your model does not support vision, the app will still work but it won't be able to "see" screenshots.
# Required embeddings model (default for RAG)
ollama pull nomic-embed-text:latestFor end-users, the recommended way to install ollama-agent is using pipx, which installs the application in an isolated environment.
# Install from GitHub
pipx install git+https://github.com/arrase/ollama-agent.gitStart the interactive REPL:
ollama-agentOr run a single prompt (non-interactive):
ollama-agent -p "List all files in the current directory as JSON."To start the chat interface, simply run:
ollama-agentThe REPL provides a persistent chat session. You can use slash commands to manage the session:
/help: Show available commands./new: Start a new chat session (clears context)./clear: Clear the screen./models: List available Ollama models (shows tool support)./model-set <model>: Switch to a different model (conversation preserved)./sessions: List saved sessions./session-load <id>: Load a saved session./session-delete <id>: Delete a saved session./tasks: List saved tasks./task-run <id>: Run a specific task./task-delete <id>: Delete a specific task./rag: Show current RAG database status./rag-list: List available RAG databases./rag-create <name>: Create a new RAG database./rag-load <name>: Load a RAG database for the session./rag-unload: Unload the current RAG database./rag-add <path>: Add a file to the loaded RAG database./rag-add <path> --dir: Add all files from a directory./rag-delete <name>: Delete a RAG database./skills: List all skills./skill-show <id>: Show skill details./skill-create <id>: Create a skill interactively./skill-delete <id>: Delete a skill./exit: Quit the application.
You can run a single prompt directly from the command line:
ollama-agent --prompt "List all files in the current directory as JSON."
# Or using the short form:
ollama-agent -p "List all files in the current directory as JSON."Screen vision is not limited to a specific mode: it works anywhere you can type a prompt (both REPL and CLI).
Attach a screenshot of a monitor as context by including @dpN in your prompt (N is a 0-based monitor index):
ollama-agent -p "Describe what you see in @dp0"If you include multiple tokens (e.g. @dp0 @dp1), the agent will capture and attach each requested monitor.
You can override the configured model, reasoning effort, or tool execution timeout:
ollama-agent --model "gpt-oss:20b" --effort "high" --prompt "What is the current date?"
# Or using short forms:
ollama-agent -m "gpt-oss:20b" -e "high" -p "What is the current date?"Thinking / Reasoning effort — the --effort flag maps to Ollama's native think parameter. Thinking-capable models emit a thinking field that separates their reasoning trace from the final answer.
| Model family | --effort value |
Ollama think value |
Behaviour |
|---|---|---|---|
| GPT-OSS | low / medium / high |
"low" / "medium" / "high" |
Sets the thinking trace length. GPT-OSS only accepts these levels; true/false is ignored. |
| GPT-OSS | disabled / hide |
(not sent) | GPT-OSS cannot fully disable thinking. For disabled, a warning is emitted and the default level is used. In both cases, the thinking trace is hidden from the UI. |
| GPT-OSS | enabled |
"medium" |
Enables thinking using the default medium effort level. |
| Other thinking models (Qwen 3, DeepSeek R1, DeepSeek-v3.1, …) | low / medium / high / enabled |
true |
Enables thinking. The specific levels are ignored by Ollama but turn on thinking. |
| Other thinking models | hide |
true |
The model generates the reasoning trace, but it is hidden from the UI output. |
| Other thinking models | disabled |
false |
Disables thinking at the model level. |
| Non-thinking models | (any) | (not sent) | Setting is ignored. |
Thinking is enabled by default in Ollama for supported models. See the Ollama thinking docs for the full list of supported models and API details.
ollama-agent --builtin-tool-timeout 60 --prompt "Run a long-running task"
# Or using short forms:
ollama-agent -t 60 -p "Run a long-running task"Available Parameters:
-m,--model: Specify the AI model to use-p,--prompt: Provide a prompt for non-interactive mode-e,--effort: Set reasoning effort level (low, medium, high, disabled, hide, enabled)-t,--builtin-tool-timeout: Set tool-call timeout in seconds (applies to tool executions, including shell backend and built-in tools). Overridesbuiltin_tool_timeoutfromconfig.inifor the current run.--rag <database>: Load a RAG database for the session--skills-dir <dir>: Additional skills directory (can be repeated to add multiple sources)
Tasks are saved prompts that can be executed repeatedly.
Create a Task (CLI):
ollama-agent task-create <task_id> \
--title "My task title" \
--task-prompt "Do the thing" \
--task-model "gpt-oss:20b" \
--task-effort "medium"- Use
--forceto overwrite an existing task. task_idmust be filesystem-safe (letters, numbers,_,-).
Create a Task (REPL):
Inside the REPL:
/task-create <task_id>
The REPL will prompt you for title/model/effort and then lets you enter a multiline prompt (finish with Esc+Enter).
Create a Task (manual YAML):
Tasks are stored as YAML files in ~/.ollama-agent/tasks/. To create one, add a new file named <task_id>.yaml in that directory.
<task_id>can be any filesystem-safe ID (it will show up intask-listand is what you pass totask-run).- The YAML supports:
title,prompt,model, and (optionally)reasoning_effort.
Example:
title: "List repo tree"
prompt: "List all files in this repository as a tree."
model: "gpt-oss:20b"
reasoning_effort: "medium" # low|medium|high|disabled|hide|enabledList Tasks:
ollama-agent task-list
# or inside REPL: /tasksRun a Task:
Use the task ID (or a unique prefix) from the list to run it.
ollama-agent task-run <task_id>
# or inside REPL: /task-run <task_id>Delete a Task:
ollama-agent task-delete <task_id>
# or inside REPL: /task-delete <task_id>On the first run, the application will create a default configuration file at ~/.ollama-agent/settings.yaml. You can edit this file to permanently change the default model, Ollama host, and other settings.
Example default settings.yaml:
model:
name: qwen3.5:9b
base_url: http://localhost:11434
temperature: 0.0
context_window: null
reasoning_effort: medium
runtime:
allow_traversal: true
builtin_tool_timeout: 30
rag:
rag_dir: /home/arrase/.ollama-agent/rag
embedder_model: nomic-embed-text:latest
embedder_base_url: http://localhost:11434
embedding_dims: 768
default_top_k: 5
chunk_size: 500
chunk_overlap: 50| Key | Description |
|---|---|
model.name |
Default Ollama model. Must support tool calling. |
model.base_url |
Native Ollama host (e.g. http://localhost:11434). Must not contain an /v1 path. |
model.reasoning_effort |
Default thinking level: low, medium, high, disabled, hide, or enabled. See Thinking / Reasoning above. |
model.context_window |
If set, forces the runtime num_ctx for the selected model. Leave null to let the app resolve it automatically. |
runtime.builtin_tool_timeout |
Timeout in seconds for tool executions. |
Ollama Agent needs to know the effective context window (num_ctx) for every model. The runtime resolves it in this order:
context_windowfromconfig.ini, if defined.PARAMETER num_ctxfromollama show <model>(the model's Modelfile).- The model's reported
*.context_lengthmetadata fromollama show <model>.
If none of those sources provides a value, the app exits with a clear error asking you to set model.context_window in settings.yaml.
If you need to reset the configuration or system prompt to their default values, you can use the --config-reset flag:
# Reset all configuration files
ollama-agent --config-reset all
# Reset only the system prompt (instructions.md)
ollama-agent --config-reset system-prompt
# Reset only the settings (settings.yaml)
ollama-agent --config-reset config-fileNote: When upgrading from v0.1 to v0.2, it is recommended to reset the system prompt to ensure compatibility with new features:
ollama-agent --config-reset system-prompt
Ollama Agent supports native tracing via LangSmith. To enable it, simply add the langsmith section to your ~/.ollama-agent/settings.yaml:
langsmith:
api_key: "your-api-key"
tracing: "true"
project: "your-project-name"
endpoint: "https://api.smith.langchain.com" # Optional, useful for EU or specific regionsWhen configured, the agent will automatically inject these values into the environment upon startup, enabling deep tracing of tool executions, reasoning steps, and agent workflows. If omitted, no environment variables will be injected and tracing will remain disabled.
The agent manages its own long-term memory via a file located at ~/.ollama-agent/MEMORY.md. The agent uses built-in tools to read, update, and persist facts, preferences, and context across sessions automatically.
RAG allows the agent to search through your documents and use relevant context when answering questions. Documents are chunked, embedded using Ollama, and stored in local Qdrant databases.
RAG databases are stored in ~/.ollama-agent/rag/<name>/. Each database is independent and can contain documents from different sources.
Create a Database (CLI):
ollama-agent rag-create my-docsCreate a Database (REPL):
/rag-create my-docs
List Databases:
ollama-agent rag-list
# or inside REPL: /rag-listDelete a Database:
ollama-agent rag-delete my-docs
# or inside REPL: /rag-delete my-docsBefore adding documents, you need to load a database (in REPL) or specify it in the command (CLI).
Add a Single File (CLI):
ollama-agent rag-add my-docs /path/to/document.mdAdd a Directory (CLI):
ollama-agent rag-add my-docs /path/to/folder --dirAdd Files (REPL):
First load the database, then add files:
/rag-load my-docs
/rag-add /path/to/document.md
/rag-add /path/to/folder --dir
Supported file types include: .txt, .md, .py, .js, .ts, .json, .yaml, .yml, .html, .css, .xml, .csv, .rst, .ini, .cfg, .sh
Manual query commands have been removed from both CLI and REPL. Load a RAG database and ask your question normally — the agent will use the rag_search tool automatically when it needs document context.
Once a RAG database is loaded, the agent can automatically search it using the rag_search tool, which returns both formatted context and detailed results with relevance scores.
Start REPL with RAG:
ollama-agent --rag my-docsUse RAG in Non-Interactive Mode:
ollama-agent --rag my-docs -p "What does the documentation say about configuration?"Switch RAG Database (REPL):
/rag-load another-db
RAG settings are located in ~/.ollama-agent/settings.yaml under the rag section:
rag:
rag_dir: /home/user/.ollama-agent/rag
embedder_model: nomic-embed-text:latest
embedder_base_url: http://localhost:11434
embedding_dims: 768
default_top_k: 5
chunk_size: 500
chunk_overlap: 50rag_dir: Directory where RAG databases are storedembedder_model: Ollama model used for generating embeddingsembedding_dims: Dimension of the embedding vectors (must match the model)default_top_k: Default number of results to return in searcheschunk_size: Maximum size of text chunks (in characters)chunk_overlap: Overlap between consecutive chunks
Skills are reusable agent capabilities that provide specialized workflows and domain knowledge. They follow the Agent Skills specification and are powered by DeepAgents skills.
When a prompt arrives, the agent checks skill descriptions to find relevant ones. Only when a skill matches does the agent read the full instructions — this pattern is called progressive disclosure and keeps the system prompt lean.
Each skill is a directory containing at least a SKILL.md file with YAML frontmatter:
~/.ollama-agent/skills/
├── langgraph-docs/
│ └── SKILL.md
└── arxiv-search/
├── SKILL.md
└── arxiv_search.py
Example SKILL.md:
---
name: langgraph-docs
description: Use this skill for requests related to LangGraph in order to fetch relevant documentation to provide accurate, up-to-date guidance.
---
# langgraph-docs
## Overview
This skill explains how to access LangGraph Python documentation.
## Instructions
1. Fetch the documentation index using the fetch_url tool.
2. Select 2-4 most relevant documentation URLs.
3. Fetch selected documentation.
4. Provide accurate guidance based on the docs.Additional files (scripts, templates, docs) can be placed alongside SKILL.md — just reference them in the instructions so the agent knows when and how to use them.
Skills are loaded from multiple directories in order (last wins for same-name skills):
- Global:
~/.ollama-agent/skills/— user-level skills available in every session. - Project:
./skills/— project-specific skills in the current working directory. - CLI extra: directories passed via
--skills-dir.
# Load additional skill sources
ollama-agent --skills-dir /path/to/team-skills --skills-dir /path/to/project-skills -p "Help me with LangGraph"Create a Skill:
ollama-agent skill-create langgraph-docs \
--name "LangGraph Docs" \
--description "Fetch relevant LangGraph documentation" \
--instructions "Use fetch_url to read https://docs.langchain.com/llms.txt and select relevant pages."Use --force to overwrite an existing skill.
List Skills:
ollama-agent skill-list
# or inside REPL: /skillsShow Skill Details:
ollama-agent skill-show langgraph-docs
# or inside REPL: /skill-show langgraph-docsDelete a Skill:
ollama-agent skill-delete langgraph-docs
# or inside REPL: /skill-delete langgraph-docsInside the REPL you can create skills interactively:
/skill-create my-skill
The REPL will prompt for name, description, and then open a multiline editor for instructions (finish with Esc+Enter).
You can also create skills by hand — just create a directory under ~/.ollama-agent/skills/ with a SKILL.md file:
mkdir -p ~/.ollama-agent/skills/my-skill
cat > ~/.ollama-agent/skills/my-skill/SKILL.md << 'EOF'
---
name: my-skill
description: A custom skill that does something useful.
---
# my-skill
## Instructions
Your instructions here.
EOF- Write clear, specific descriptions — the agent decides whether to use a skill based on the description alone.
SKILL.mdfiles must be under 10 MB; larger files are skipped.- Descriptions longer than 1024 characters are truncated.
- Skills directories that don't exist are silently ignored.
You can customize the agent's behavior by editing the instructions file at ~/.ollama-agent/instructions.md. This file is automatically created on first use with default instructions.
The main agent supports the Model Context Protocol (MCP) to extend its capabilities with additional tools. MCP servers configured in ~/.ollama-agent/mcp_servers.json provide their tools directly to the main agent — no subagent wrapping.
Create ~/.ollama-agent/mcp_servers.json:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/documents"]
},
"brave-search": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-brave-search"],
"env": {
"BRAVE_API_KEY": "your-key-here"
}
},
"remote-api": {
"url": "http://localhost:8000/mcp"
}
}
}Supported transports:
- stdio: Set
command(and optionallyargs,env) to launch a subprocess. - http: Set
urlto connect to a remote MCP server.
All tools from all configured servers are loaded and made directly available to the main agent. If a server fails to connect, it is skipped and the agent continues normally.
Define specialized subagents that your main agent can delegate tasks to. Each subagent has its own isolated context, model, skills, and MCP servers — keeping the orchestrator's context clean and focused.
Configure them in ~/.ollama-agent/settings.yaml:
subagents:
- name: "research-agent"
description: "Delegate here for complex research or web searches."
system_prompt: "You are a research specialist. Search thoroughly and return concise summaries."
model: "gemma4:26b" # Optional, inherits from main agent
context_window: 65536 # Optional, inherits from main agent
skills_paths:
- "./skills/research"
mcp_servers:
- name: "brave-search"
command: "npx"
args: ["-y", "@modelcontextprotocol/server-brave-search"]
env:
BRAVE_API_KEY: "${BRAVE_API_KEY}"
- name: "database-agent"
description: "Delegate here when the user asks about customer or sales data."
system_prompt: "You are a database analyst. Query the database and summarize results."
skills_paths:
- "./skills/database"
mcp_servers:
- name: "sqlite-server"
command: "uvx"
args: ["mcp-server-sqlite", "--db-path", "./data/ventas.db"]- Context isolation: Subagent tool calls don't bloat the main agent's context — only the final result is returned.
- Environment injection: Use
${VAR_NAME}in MCPenvfields to inject secrets from the host environment. - Skills & MCP per subagent: Each subagent can have its own skills directories and MCP server connections, completely independent from the main agent.
- Graceful failures: If a subagent's MCP server fails to load (e.g., missing env vars), it is skipped and the agent continues normally.
Interested in contributing? Great! Here’s how to get started.
-
Clone the repository:
git clone https://github.com/arrase/ollama-agent.git cd ollama-agent -
Create a virtual environment:
python -m venv .venv source .venv/bin/activate -
Install in editable mode:
This will install the project and its dependencies. The
-eflag allows you to make changes to the source code and have them immediately reflected.pip install -e .
ollama_agent/main.py: Main application entry point.ollama_agent/interfaces/: CLI and REPL interface implementations.ollama_agent/agent/: Core agent logic (DeepAgents graph), session management, and built-in tools.ollama_agent/core/: Shared types, model capability checks, and common utilities.ollama_agent/tasks/: Task management system.ollama_agent/skills/: Skills management and DeepAgents skills integration.ollama_agent/rag/: RAG implementation for context retrieval.ollama_agent/mcp/: MCP server lifecycle and integration helpers.ollama_agent/vision/: Screen vision and screenshot analysis.ollama_agent/streaming/: Console output streaming, rendering, and non-interactive runner.ollama_agent/settings/: Application configuration and centralized filesystem paths.