EvoEmbedding = Native Memory + Latent Embedding
Instead of encoding text segments into isolated static vectors, EvoEmbedding sequentially processes the input stream, continuously updates a Latent Memory Queue, and jointly generates context-aware, Evolvable Embeddings for precise long-context retrieval.
- π¨ Framework
- π Dataset
- π Conclusions
- π Quick Start
- π Citation
Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. EvoEmbedding is a novel embedding model that generates \textit{evolvable} representations for retrieval.It maintains a continuously updated latent memory as it sequentially processes inputs, and uses it alongside the raw content to jointly generate evolvable embeddings.
At each step, the model coordinates two decoupled, parallel operations to process incoming text segments:
- π§ Memory Evolution: Automatically compresses the current text segment and fuses it with previous native memory, pushing the updated state into a bounded FIFO Latent Memory Queue. This ensures continuous state tracking without massive memory overhead.
- β¨ Representation Generation: Dynamically combines the historical latent memory with the raw input segment to generate context-aware, Evolvable Embeddings. The resulting representations are highly sensitive to chronological order and semantic shifts.
The released EvoTrain-180K dataset uses an intuitive, chat-style fine-tuning format designed for joint SFT (Supervised Fine-Tuning) and retrieval optimization.
Each training instance is a self-contained context window structured with the following key components:
-
messages: Standard alternating user/assistant turns representing the SFT generation target. -
meta.turns: Formatted turn-level strings (concatenating User and Assistant messages) processed sequentially by the retriever path to simulate dynamic context streaming. -
meta.evidence_turns: Zero-based indices pointing to the specific historical turn(s) containing the ground-truth evidence (serving as positive targets$v^+$ for contrastive loss).
Here is a representative structural example (intermediate context omitted for brevity):
{
"messages": [
{
"role": "user",
"content": "[Target Evidence] Iβm a retired cop... Any good local diners? No seafood, please."
},
{
"role": "assistant",
"content": "Try Mamaβs Diner on Main St. No seafood on the menu."
},
... (Multiple irrelevant dialogue turns / long context chunks omitted) ...
{
"role": "user",
"content": "[Query] What type of restaurant did I say not to recommend earlier?"
},
{
"role": "assistant",
"content": "[Ground-truth Answer] Seafood."
}
],
"meta": {
"evidence_turns": [
0
],
"turns": [
"User: [Target Evidence] Iβm a retired cop... Any good local diners? No seafood, please.\nAssistant: Try Mamaβs Diner on Main St. No seafood on the menu.",
"... (Multiple irrelevant dialogue turns / long context chunks omitted) ...",
"[Query] What type of restaurant did I say not to recommend earlier?"
]
}
}Our JSON schema makes it straightforward to adapt EvoEmbedding to your custom data, whether it's document QA, customer support logs, or specialized RAG databases.
To construct your own training data, simply follow these 3 steps:
- Chunk your context: Map your document chunks, paragraphs, or chat turns sequentially into the
meta.turnslist. - Label the evidence: Identify which chunk(s) contain the answer and put their index into
meta.evidence_turns. - Set the objective: Place the final query at the end of
meta.turns, and ensure themessagesarray reflects the full context sequence ending with the assistant's correct response.
This design bypasses the need for complex vector-database setups during training. Our SFT pipeline natively consumes this format, allowing you to easily fine-tune the model to track dynamic states and retrieve accurately in your proprietary domain.
EvoEmbedding achieves superior results across 10 benchmarks, outperforming established static and larger-scale specialist models (such as Qwen3-Embedding-8B and KaLM-Embedding-Gemma3-12B) with smaller parameter sizes.
A standard naive RAG pipeline using EvoEmbedding-4B outperforms complex, dedicated agentic memory architectures (such as Mem0 and MemoryOS) while requiring no explicit memory construction token overhead at test time.
EvoEmbedding is highly compatible as a drop-in replacement. Integrating it into existing baseline frameworks (like A-MEM and LightMem) yields substantial performance gains (+19.2% and +13.5% respectively) without modifying the core generative LLMs.
Unlike static embeddings that suffer from representation entanglement in long histories, EvoEmbedding's latent space is highly sensitive to chronological order. It successfully decouples temporal intents, excelling at queries constrained by temporal keywords (such as "firstly" and "lastly").
EvoEmbedding/
βββ model/ # model implementation and client
βββ train/ # training entrypoint
βββ eval/ # evaluation scripts
βββ docs/ # project page and visual assets
βββ requirements-evoembedding-4b.txt
βββ requirements-evoembedding-lite.txt
Use the matching environment and dependency file for the model family you want to run.
| Model size | Conda env | Requirements |
|---|---|---|
| EvoEmbedding-0.8B / 2B | evoemb |
requirements-evoembedding-lite.txt |
| EvoEmbedding-4B | evoemb |
requirements-evoembedding-4b.txt |
conda activate evoemb
pip install -r requirements-evoembedding-4b.txtfrom model.client import EvoEmbeddingClient
client = EvoEmbeddingClient()
messages = [
{"role": "user", "content": "I visited Paris in April."},
{"role": "assistant", "content": "Noted."},
{"role": "user", "content": "I bought a new laptop yesterday."},
{"role": "assistant", "content": "Got it."},
{"role": "user", "content": "Where did I travel in spring?"},
]
embeddings = client.encode_messages(messages)The messages input preserves the original dialogue order. encode_messages returns normalized embeddings for the history turns and the final query.
candidates = [
"I visited Paris in April.",
"I bought a new laptop yesterday.",
"The meeting was moved to Friday.",
]
query = "Where did I travel in spring?"
ranked_candidates, ranked_indices = client.rerank(
query,
candidates,
top_k=1,
return_indices=True,
)The reranker takes a direct list of candidate strings and returns them in relevance order.
Train the model size with its matching base model and dependency file:
conda activate evoemb
pip install -r requirements-evoembedding-4b.txt
PYTHONPATH=. torchrun --nproc_per_node=8 train/train.py \
--dataset_name MiG-NJU/EvoTrain-180K \
--base_model Qwen/Qwen3-4B-Instruct-2507 \
--output_dir ./output/evoembedding-4bFor the 0.8B and 2B variants, use evoemb with requirements-evoembedding-lite.txt and replace --base_model and --output_dir with the corresponding model paths.
Run a single benchmark:
PYTHONPATH=. python eval/eval.py \
--eval_method rag \
--model_name EvoEmbedding \
--eval_bench locomo \
--rag_sentence_num 16 \
--embedding_model Qwen/Qwen3-Embedding-0.6BRun the batch evaluation script:
PYTHONPATH=. bash eval/eval.sh@article{nie2026evoembedding,
title={EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory},
author={Nie, Chang and Fu, Chaoyou and Feng, Junlan and Shan, Caifeng},
journal={arXiv preprint},
year={2026}
}




