Replies: 1 comment 2 replies
-
|
@xerudro — agentmemory already runs against Ollama / LM Studio / vLLM / llama.cpp / any OpenAI-API-compatible local server. The docs gap is real though — the only mention was buried in a one-line comment inside the env-example block, so it was easy to miss. Quick copy-paste for the two most common setups: Ollama (default port ollama pull qwen2.5-coder:7b
ollama serve# ~/.agentmemory/.env
OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_MODEL=qwen2.5-coder:7bLM Studio (default port Open LM Studio → Local Server tab → Start Server (any chat model in the picker). # ~/.agentmemory/.env
OPENAI_API_KEY=lmstudio
OPENAI_BASE_URL=http://localhost:1234/v1
OPENAI_MODEL=qwen2.5-coder-7b-instructRestart agentmemory and the consolidation pipeline, compression, summarization, and graph extraction all run against your local server. Zero paid LLM calls. Embeddings are local-by-default too ( For memory work specifically, a 7B instruct model (Qwen 2.5 Coder, Llama 3.2, Mistral, DeepSeek-R1) is plenty — compression is short summarization, not full reasoning. The 3B/7B size range fits on consumer hardware (4-5 GB RAM) and runs faster than the paid APIs. I've opened #697 to add a dedicated "Local models (Ollama / LM Studio / vLLM)" section under the LLM Providers docs so the next person doesn't have to ask. Includes the model-pick table and a callout for the reasoning-model empty-content pitfall (some local servers don't surface the Heads-up if you're using a configured provider but seeing no graph nodes / lessons / crystals: that's a separate bug (consolidation defaulting to off). Fix landing in #696 — until that merges, set |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This tool should be compatible to run with local lm agents thru Ollama or LMStudio also. Why all the tools runs on paid LMs ? We have hardware to run them locally also.
Beta Was this translation helpful? Give feedback.
All reactions