Simple, composable AI for Python, local or in the cloud.
Docs Β· Tutorials Β· How-to Β· Reference Β· Examples Β· Notebooks
AIMU is a Python library for AI-powered applications, with language models as the primary building block. It gives you a single provider-agnostic interface across text, images, audio, and speech, autonomous agents and code-controlled workflows, and small composable utilities for tools, memory, prompt tuning, evaluations, and benchmarking. All of these features in plain Python that is apparent and easy to use.
Whether you need vision input, autonomous tool use, image generation, audio generation, or text-to-speech, the call is one line:
aimu.chat("What's in this photo?", model="...", images=["photo.jpg"])
aimu.agent("...", tools=builtin.web).run("Search the web and summarize today's AI news")
aimu.generate_image("a watercolor fox in a snowy forest", model="...")
aimu.generate_audio("a lo-fi hip-hop beat with soft piano", model="...")
aimu.generate_speech("Hello, world!", model="...")Composition happens by passing objects to constructors. Conversation state is a list[dict] you can print and edit. Provider-specific details adapt at request time and never leak into your code.
AIMU is compact, direct, and easy to understand, by design. Six principles shape the API: plain Python, plain data (OpenAI message dicts only), composability through uniform interfaces, progressive disclosure of capabilities, direct paths for common tasks, and apparent failures. The reasoning behind each, and the patterns each one excludes, lives on the design principles page.
A curated model catalog, capturing model capabilities and nuances, is part of that design: every "provider:model_id" string must name a model AIMU ships a spec for. An unknown id raises rather than running with guessed capabilities. To use a one-off custom model, build the spec and pass it directly (aimu.image_client(HuggingFaceImageSpec(...))).
pip install aimu[all]Or pick the providers you need: aimu[ollama], aimu[anthropic], aimu[openai_compat] (also enables OpenAI TTS speech and transcription STT), aimu[hf] (text + HuggingFace diffusers image + HuggingFace audio + HuggingFace TTS speech), aimu[google] (Nano Banana image generation), aimu[llamacpp]. See installation in the docs for the full list of extras.
- One provider-agnostic client. A single interface covers Ollama, HuggingFace, llama-cpp, Claude, OpenAI, Gemini, and any OpenAI-compatible local server (LM Studio, vLLM, SGLang, llama-server, HF Transformers Serve), and you swap providers with a
"provider:model_id"string change. - Reasoning, tools, and vision everywhere. These capabilities work identically across every provider, and reasoning models surface their thinking as a distinct stream phase through the same API.
- Typed streaming. A streamed response arrives as
StreamChunkobjects tagged by phase, so reasoning, tool calls, and generated text are each labelled and you keep only the phases you want withinclude=. The same chunk type flows throughclient.chat(),Agent.run(), and every workflow, so one rendering loop handles them all. - Structured output. Pass
schema=(a dataclass or Pydantic model) tochat()orgenerate()to get a validated, typed object back, with native enforcement on OpenAI, Ollama, and Anthropic and a prompt-and-parse fallback (parse_json_response,generate_json) elsewhere. - Embeddings. Map text to vectors with
aimu.embedding_client()/aimu.embed()over OpenAI, Ollama, and local HuggingFacesentence-transformers, where a string returns one vector and a list returns a list of vectors. - Local weight reuse. HuggingFace clients across every modality (text, image, audio, speech) and llama-cpp share loaded weights through a process-level registry, so a second client for the same model skips the load, and
aimu.clear_hf_cache()/aimu.clear_llamacpp_cache()free VRAM on demand. - Timeouts and retries. Pass
timeout=andmax_retries=to any networked client (aimu.client(model, timeout=30, max_retries=5)); they forward straight to the provider SDK's own request timeout and bounded retry on transient failures, no extra machinery. - Provider failover. Wrap an ordered list of clients in a
FallbackClient(FallbackClient([primary, backup])); it tries each in turn on error, preserving conversation history across the switch. Since it's itself aBaseModelClient, it drops into anAgent, a workflow, or a benchmark unchanged. - Anthropic prompt caching. Opt in with
aimu.client("anthropic:...", cache_prompt=True)to cache the system prompt and tool schemas (the prefix an agent resends every turn) for cheaper, faster calls; cache token counts surface inclient.last_usage.
- Image generation. Create images with
aimu.image_client()/aimu.generate_image()using HuggingFacediffuserslocally (SD 1.5, SDXL, SD 3.5, FLUX) and Google Nano Banana in the cloud, and passreference_image=to anygenerate()for image-to-image. - Audio and speech synthesis. Generate music and sound with
generate_audio(MusicGen, AudioLDM2, Stable Audio Open) and spoken audio withgenerate_speech(HuggingFace SpeechT5, MMS-TTS, BARK, plus OpenAItts-1). - Transcription. Convert speech to text with
aimu.transcribe()over OpenAI (whisper-1, gpt-4o-transcribe) and the local HuggingFace Whisper family, requestingresponse_format="verbose_json"for timed segments.
- Autonomous agents. An
Agentruns a tool-using loop until the model stops calling tools, whileOrchestratorAgentcoordinates sub-agents and ships three prebuilt variants (CodeReviewAgent,ContentCreationAgent,ResearchReportAgent). - Code-controlled workflows. Build pipelines from
Chain,Router,Parallel, andEvaluatorOptimizer, each with afrom_client()factory, then compose them freely: workflows take agents as steps, andrunner.as_tool()hands any agent or workflow to another agent as a tool. - Interchangeable clients. Calling
agent.as_model_client()makes any agent a drop-inBaseModelClient, so agentic and non-agentic clients substitute for each other. - A2A interop. With the optional
a2aextra,serve_a2a(runner)exposes any agent or workflow over the Agent2Agent protocol, andRemoteAgent.connect(url)consumes a remote agent as a localRunner, so it composes (as a tool, a workflow step, or an orchestrator worker) exactly like a local one. The agent-level analog of MCP for tools. - Tracing. See exactly what a run did:
extract_tool_calls(messages)returns a clean list of{iteration, tool, arguments, result}records from any message history, andpretty_print(stream)renders a live run to the console with reasoning, tool calls, and output labelled. - Resumable runs. Persist a run's message history and rebuild its state later with
runner.restore(messages), which handles system-message de-duplication so any runner (Agent,Chain,EvaluatorOptimizer,Router,Parallel,OrchestratorAgent) can survive a crash or restart, on both the sync andaimu.aiosurfaces.
- Plain-function tools. Turn any plain function into a tool with
@tool, where type hints and the docstring become the spec. The model's call arguments are validated and coerced against those hints before the tool runs, with a clear error returned to the model when they don't fit. Passtools=tochat()orAgent.run()to override the tool set for one call (ortools=[]to disable). - Context across tool calls. Hand an agent a
deps=object and any tool reads or updates it through actx: ToolContext[Deps]parameter, so shared state (a store, a cache, config) carries across tool calls within a run without module globals, and the parameter stays hidden from the model-facing schema. - Pre-built tools. A library of ready-to-use tools ships in the box: web search and fetch, filesystem reads, a calculator, an opt-in sandboxed Python REPL, and the generative-modality tools, grouped as
builtin.web/fs/compute/misc/image/audio/speech/transcriptionto pass straight totools=. - MCP integration. Bring cross-process FastMCP tools into the same registry with
MCPClientandmcp.as_tools(), mixing them freely with@toolfunctions (tools=builtin.web + mcp.as_tools()). - Agent skills. A
SkillAgentdiscoversSKILL.mdfiles on the filesystem (the same format Claude Code uses) and injects their instructions and tools on demand, so you extend an agent by dropping in a folder instead of changing code.
- Uniform store interface. The
SemanticMemoryStore,DocumentStore, andConversationManagerclasses all implementMemoryStoreand are interchangeable wherever one is accepted. - Semantic memory. ChromaDB vector search backs fact storage and retrieval, and you can pass
embedding_client=to use your own embedding model instead of ChromaDB's default. - Document and conversation stores. A path-keyed
DocumentStore(drop-in compatible with the Claude memory tool API) and a TinyDB-backedConversationManagerpersist documents and chat history across sessions. - Memory tools. Calling
make_memory_tools(store)exposes store, search, and list as agent tools, while FastMCP servers inaimu.memory.mcp/aimu.memory.document_mcpcover cross-process and multi-agent use. - Retrieval-augmented generation. The
aimu.ragmodule provides plain functions overMemoryStore(split_text,ingest,retrieve,format_context, and optional cross-encoderrerank) with no retriever, splitter, or loader class hierarchy.
- Prompt tuning. A hill-climbing
PromptTuneroptimises prompts against labelled data, with four concrete tuners for classification, multi-class, extraction, and judged-generation. - Benchmarking. The
Benchmarkharness runs one prompt across multiple clients (plain or agentic, mixed providers) and returns a comparison DataFrame, and DeepEval metrics plug in asScorers.
- Full async mirror. The
aimu.aionamespace mirrors the entire public surface with the same class names, so one import switches paradigms while the sync ladder stays unchanged, and the same@toolfunctions work on both surfaces. - Structured concurrency. Both
aio.Parallelandconcurrent_tool_calls=Trueuseasyncio.TaskGroupfor sibling cancellation andExceptionGroupaggregation, with native async providers throughout and in-process providers (HuggingFace, LlamaCpp) wrapping a sync client so weights load once.
One-shot with aimu.chat(), multi-turn with aimu.client(), and streaming with phase filtering (thinking, tool usage, generation) via include=. Omit model= and AIMU resolves one for you by reading environment variables or auto-selects an available local model (LLMs only) via a running Ollama server, a cached HuggingFace model, or a local OpenAI-compatible server.
import aimu
# One-shot
text = aimu.chat("Hello", model="anthropic:claude-sonnet-4-6")
# Multi-turn (history preserved across calls)
client = aimu.client("ollama:qwen3.5:9b", system="You are concise.")
client.chat("Hi there")
client.chat("What did I just say?")
# Default model: resolves AIMU_LANGUAGE_MODEL or a discovered local model
reply = aimu.chat("Hello")
client = aimu.client(system="Be brief.")
# Streaming: drop unwanted phases (thinking, tool calls) with include=
for chunk in client.chat("Tell me a story", stream=True, include=["generating"]):
print(chunk.content, end="", flush=True)@aimu.tool turns any plain function into a tool (type hints + docstring become the spec). Chain.from_client() runs a series of LLM calls over a shared client with per-step instructions; Router, Parallel, and EvaluatorOptimizer follow the same shape.
import aimu
from aimu.agents import Chain
@aimu.tool
def letter_counter(word: str, letter: str) -> int:
"""Count occurrences of a letter in a word."""
return word.lower().count(letter.lower())
agent = aimu.agent("ollama:qwen3.5:9b", tools=[letter_counter])
print(agent.run("How many r's in strawberry?"))
chain = Chain.from_client(agent.model_client, [
"Break the task into clear steps.",
"Execute each step using available tools.",
"Polish the result into a single paragraph.",
])
result = chain.run("Research the top Python web frameworks.")Tools can reach shared state through an injected ToolContext (no globals), a critic can return a
typed verdict instead of a magic string, and pretty_print renders a streamed run:
import aimu
from dataclasses import dataclass, field
from pydantic import BaseModel
from aimu import ToolContext
from aimu.agents import Agent, EvaluatorOptimizer
@dataclass
class Deps:
seen: dict = field(default_factory=dict)
@aimu.tool
def remember(ctx: ToolContext[Deps], key: str, value: str) -> str:
"""Store a value under a key.""" # the model only sees key + value; ctx is injected
ctx.deps.seen[key] = value
return "ok"
writer = Agent(aimu.client("ollama:qwen3.5:9b"), tools=[remember], deps=Deps())
aimu.pretty_print(writer.run("Remember the sky is blue, then summarize.", stream=True))
class Verdict(BaseModel):
passed: bool
feedback: str = ""
critic = Agent(aimu.client("ollama:qwen3.5:9b"), "Judge the draft; set passed and feedback.")
review = EvaluatorOptimizer(generator=writer, evaluator=critic, verdict_schema=Verdict)
print(review.run("Explain gradient descent."))Pass images= or audio= to any vision- or audio-capable text model, on stateful chat() or stateless one-shot generate(). Both accept a file path (str or pathlib.Path), raw bytes, a data:...;base64,... URL, or an https:// URL. Audio providers: OpenAI (GPT-4o, GPT-4.1 series), Gemini (2.0/2.5), HuggingFace Gemma 4 / Nemotron-H-8B.
client = aimu.client("openai:gpt-4o")
# Vision
client.chat("What's in this image?", images=["./cat.jpg"]) # multi-turn, keeps history
client.generate("Caption this image.", images=["./cat.jpg"]) # one-shot, no history
# Audio
client.chat("Transcribe this clip.", audio=["./interview.wav"]) # multi-turn
client.generate("What language is spoken here?", audio=["./clip.mp3"]) # one-shotEach modality has a parallel factory (image_client / audio_client / speech_client) and one-shot helper (generate_image / generate_audio / generate_speech), all using the same provider:model_id shape. Pass reference_image= to any image generate() for image-to-image.
# Image: local HuggingFace diffusers, with image-to-image
path = aimu.generate_image("a watercolor of a fox in a snowy forest", model="hf:runwayml/stable-diffusion-v1-5")
client = aimu.image_client("hf:stabilityai/stable-diffusion-xl-base-1.0") # reuse loaded weights
img = client.generate("a cyberpunk city skyline at dusk")
img = client.generate("a cyberpunk version", reference_image="./photo.jpg", strength=0.7)# Audio (music and sound): returns (sample_rate, np.ndarray) by default
sr, audio = aimu.generate_audio("a lo-fi hip-hop beat with soft piano", model="hf:facebook/musicgen-small", duration_s=5.0)
path = aimu.generate_audio("ambient ocean waves", model="hf:facebook/musicgen-small", format="path")# Speech synthesis (text-to-speech)
path = aimu.generate_speech("Hello, world!", model="openai:tts-1")
sr, audio = aimu.generate_speech("Hello!", model="hf:facebook/mms-tts-eng", format="numpy")
tts = aimu.speech_client("openai:tts-1-hd") # reuse a client across calls
path = tts.generate("Good morning.", voice="nova", format="path")# Transcription (speech-to-text)
text = aimu.transcribe("./clip.wav", model="openai:whisper-1")
stt = aimu.transcription_client("hf:openai/whisper-tiny")
text = stt.transcribe("./clip.wav")Pass schema= (a dataclass or Pydantic model) to chat() / generate() to get a typed object back. Native enforcement on capable models (OpenAI / Ollama / Anthropic), with a prompt-and-parse fallback otherwise.
import aimu
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
person = aimu.client("openai:gpt-4.1").chat("Extract: Ada Lovelace, 36", schema=Person)
# Person(name="Ada Lovelace", age=36)aimu.embed() / aimu.embedding_client() map text to vectors (single string β one vector, list β list of vectors). RAG is plain functions over a MemoryStore: chunk with ingest, fetch with retrieve, ground with format_context.
import aimu
from aimu.memory import SemanticMemoryStore
from aimu.rag import ingest, retrieve, format_context
vector = aimu.embed("the quick brown fox", model="openai:text-embedding-3-small") # list[float]
embedder = aimu.embedding_client("hf:BAAI/bge-small-en-v1.5") # local sentence-transformers
vectors = embedder.embed(["alpha", "beta"]) # list[list[float]]
store = SemanticMemoryStore(embedding_client=embedder) # use a chosen embedding model
ingest(store, my_documents, chunk_size=800, chunk_overlap=100)
question = "What is AIMU's design philosophy?"
context = format_context(retrieve(store, question, n_results=5))
answer = aimu.chat(f"Context:\n{context}\n\nQuestion: {question}")Agents combine perception, generation, and memory in a single run. A vision-capable agent can take generate_image as a tool; make_memory_tools(store) adds store_memory, search_memories, and list_memories over an explicit (ephemeral or on-disk) store.
from aimu.agents import Agent
from aimu.tools import builtin
from aimu.tools.builtin import make_memory_tools
from aimu.memory import SemanticMemoryStore
# Perceive and create in one run
agent = Agent(aimu.client("anthropic:claude-sonnet-4-6"), tools=[builtin.generate_image])
agent.run("Describe the scene in this photo, then generate a watercolor painting of it.", images=["photo.jpg"])
# Memory across turns
store = SemanticMemoryStore(persist_path="./.memory")
agent = Agent(aimu.client("anthropic:claude-sonnet-4-6"), tools=make_memory_tools(store))
agent.run("Remember that the meeting is on Friday at 2pm.")
agent.run("When is the meeting?")aimu.aio mirrors the entire public surface with the same class names, so one import switches paradigms. aio.Parallel and concurrent_tool_calls=True use asyncio.TaskGroup for true coroutine concurrency.
import asyncio
from aimu import aio
async def main():
client = aio.client("anthropic:claude-sonnet-4-6")
agent = aio.Agent(client, tools=[my_async_tool])
reply = await agent.run("Hello")
parallel = aio.Parallel.from_client(client, worker_prompts=[...], aggregator_prompt="...")
result = await parallel.run("topic")
asyncio.run(main())- π Tutorials: Hand-held walkthroughs. Install to first agent in 15 mins
- π οΈ How-to guides: Task-oriented recipes (switch providers, write a tool, stream output, benchmark models, ...)
- π Reference: Auto-generated API docs, capability matrices, environment variables, CLI
- π‘ Explanation: The why: architecture, design principles, agents vs workflows
The notebooks/ directory ships one runnable demo per subsystem, numbered 01β23 and ordered to build up incrementally, from the model client, tools, and agents through workflows, memory, RAG, the generative modalities, the async surface, and agent composition / A2A interop. The filenames are self-describing; open the directory to browse and run them.
The examples/ directory ships larger, real-world programs organized by theme (each has its own README):
- text-refinement/: A generate β judge β refine loop over text, implemented four ways (code loop,
Agent,EvaluatorOptimizer, simulated annealing). GPU-free, Ollama-only. - image-refinement/: The same loop over images (diffusion + vision evaluator), five variants including img2img. Needs
aimu[hf]and a GPU. - news-summarizer/: One task (summarize recent AI news) solved with
Agent,Chain,Parallel, andOrchestratorAgent, selected via--method. - skills/: Demo
SKILL.mdskills (haiku-poet,unit-converter) forSkillAgentdiscovery, exposed asaimu.paths.skills.
The web/ directory ships chat applications that demonstrate AIMU in action:
- streamlit_chatbot_basic.py: ~70-line showcase. Provider/model selector, streaming chat, built-in tools. Start here.
- streamlit_chatbot.py: Full-featured. Image/audio/speech generation, agentic mode, thinking display, generation sliders, live TTS narration. Extensible foundation.
- gradio_chatbot_basic.py: Basic Gradio chat interface with streaming.
streamlit run web/streamlit_chatbot.py # full-featured Streamlit demo (agents, tools, images, audio, speech narration, etc.)
streamlit run web/streamlit_chatbot_basic.py # basic Streamlit demo app
python web/gradio_chatbot_basic.py # basic Gradio demo appSee the contributing guide for dev setup, testing, lint, and PR conventions.
Apache 2.0. See LICENSE.