Skip to content

MiG-NJU/EvoEmbedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

55 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EvoEmbedding


EvoEmbedding = Native Memory + Latent Embedding

Instead of encoding text segments into isolated static vectors, EvoEmbedding sequentially processes the input stream, continuously updates a Latent Memory Queue, and jointly generates context-aware, Evolvable Embeddings for precise long-context retrieval.

Contents

🎨 Framework

EvoEmbedding overview

Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. EvoEmbedding is a novel embedding model that generates \textit{evolvable} representations for retrieval.It maintains a continuously updated latent memory as it sequentially processes inputs, and uses it alongside the raw content to jointly generate evolvable embeddings.

At each step, the model coordinates two decoupled, parallel operations to process incoming text segments:

  • 🧠 Memory Evolution: Automatically compresses the current text segment and fuses it with previous native memory, pushing the updated state into a bounded FIFO Latent Memory Queue. This ensures continuous state tracking without massive memory overhead.
  • ✨ Representation Generation: Dynamically combines the historical latent memory with the raw input segment to generate context-aware, Evolvable Embeddings. The resulting representations are highly sensitive to chronological order and semantic shifts.

πŸ“Š Dataset

The released EvoTrain-180K dataset uses an intuitive, chat-style fine-tuning format designed for joint SFT (Supervised Fine-Tuning) and retrieval optimization.

Data Structure

Each training instance is a self-contained context window structured with the following key components:

  • messages: Standard alternating user/assistant turns representing the SFT generation target.
  • meta.turns: Formatted turn-level strings (concatenating User and Assistant messages) processed sequentially by the retriever path to simulate dynamic context streaming.
  • meta.evidence_turns: Zero-based indices pointing to the specific historical turn(s) containing the ground-truth evidence (serving as positive targets $v^+$ for contrastive loss).

Here is a representative structural example (intermediate context omitted for brevity):

{
    "messages": [
        {
            "role": "user",
            "content": "[Target Evidence] I’m a retired cop... Any good local diners? No seafood, please."
        },
        {
            "role": "assistant",
            "content": "Try Mama’s Diner on Main St. No seafood on the menu."
        },
        
        ... (Multiple irrelevant dialogue turns / long context chunks omitted) ...
        
        {
            "role": "user",
            "content": "[Query] What type of restaurant did I say not to recommend earlier?"
        },
        {
            "role": "assistant",
            "content": "[Ground-truth Answer] Seafood."
        }
    ],
    "meta": {
        "evidence_turns": [
            0
        ],
        "turns": [
            "User: [Target Evidence] I’m a retired cop... Any good local diners? No seafood, please.\nAssistant: Try Mama’s Diner on Main St. No seafood on the menu.",
            "... (Multiple irrelevant dialogue turns / long context chunks omitted) ...",
            "[Query] What type of restaurant did I say not to recommend earlier?"
        ]
    }
}

Training on Custom Datasets

Our JSON schema makes it straightforward to adapt EvoEmbedding to your custom data, whether it's document QA, customer support logs, or specialized RAG databases.

To construct your own training data, simply follow these 3 steps:

  1. Chunk your context: Map your document chunks, paragraphs, or chat turns sequentially into the meta.turns list.
  2. Label the evidence: Identify which chunk(s) contain the answer and put their index into meta.evidence_turns.
  3. Set the objective: Place the final query at the end of meta.turns, and ensure the messages array reflects the full context sequence ending with the assistant's correct response.

This design bypasses the need for complex vector-database setups during training. Our SFT pipeline natively consumes this format, allowing you to easily fine-tune the model to track dynamic states and retrieve accurately in your proprietary domain.

πŸ† Conclusions

1. State-of-the-Art Retrieval Performance

EvoEmbedding achieves superior results across 10 benchmarks, outperforming established static and larger-scale specialist models (such as Qwen3-Embedding-8B and KaLM-Embedding-Gemma3-12B) with smaller parameter sizes.

EvoEmbedding performance

naiverag

2. Naive RAG Powered by EvoEmbedding Surpasses Dedicated Agentic Memory

A standard naive RAG pipeline using EvoEmbedding-4B outperforms complex, dedicated agentic memory architectures (such as Mem0 and MemoryOS) while requiring no explicit memory construction token overhead at test time.

EvoEmbedding performance vs token cost on LongMemEval

3. Plug-and-Play Compatibility with Agentic Workflows

EvoEmbedding is highly compatible as a drop-in replacement. Integrating it into existing baseline frameworks (like A-MEM and LightMem) yields substantial performance gains (+19.2% and +13.5% respectively) without modifying the core generative LLMs.

Improvements

4. Temporal retrieval capabilities.

Unlike static embeddings that suffer from representation entanglement in long histories, EvoEmbedding's latent space is highly sensitive to chronological order. It successfully decouples temporal intents, excelling at queries constrained by temporal keywords (such as "firstly" and "lastly").

Temporal Sensitivity Analysis

πŸš€ Quick Start

Repository Structure

EvoEmbedding/
β”œβ”€β”€ model/              # model implementation and client
β”œβ”€β”€ train/              # training entrypoint
β”œβ”€β”€ eval/               # evaluation scripts
β”œβ”€β”€ docs/               # project page and visual assets
β”œβ”€β”€ requirements-evoembedding-4b.txt
└── requirements-evoembedding-lite.txt

Environment

Use the matching environment and dependency file for the model family you want to run.

Model size Conda env Requirements
EvoEmbedding-0.8B / 2B evoemb requirements-evoembedding-lite.txt
EvoEmbedding-4B evoemb requirements-evoembedding-4b.txt
conda activate evoemb
pip install -r requirements-evoembedding-4b.txt

Usage

As an Embedding Model

from model.client import EvoEmbeddingClient

client = EvoEmbeddingClient()

messages = [
    {"role": "user", "content": "I visited Paris in April."},
    {"role": "assistant", "content": "Noted."},
    {"role": "user", "content": "I bought a new laptop yesterday."},
    {"role": "assistant", "content": "Got it."},
    {"role": "user", "content": "Where did I travel in spring?"},
]

embeddings = client.encode_messages(messages)

The messages input preserves the original dialogue order. encode_messages returns normalized embeddings for the history turns and the final query.

As a Reranker

candidates = [
    "I visited Paris in April.",
    "I bought a new laptop yesterday.",
    "The meeting was moved to Friday.",
]
query = "Where did I travel in spring?"

ranked_candidates, ranked_indices = client.rerank(
    query,
    candidates,
    top_k=1,
    return_indices=True,
)

The reranker takes a direct list of candidate strings and returns them in relevance order.

Training

Train the model size with its matching base model and dependency file:

conda activate evoemb
pip install -r requirements-evoembedding-4b.txt
PYTHONPATH=. torchrun --nproc_per_node=8 train/train.py \
  --dataset_name MiG-NJU/EvoTrain-180K \
  --base_model Qwen/Qwen3-4B-Instruct-2507 \
  --output_dir ./output/evoembedding-4b

For the 0.8B and 2B variants, use evoemb with requirements-evoembedding-lite.txt and replace --base_model and --output_dir with the corresponding model paths.

Evaluation

Run a single benchmark:

PYTHONPATH=. python eval/eval.py \
  --eval_method rag \
  --model_name EvoEmbedding \
  --eval_bench locomo \
  --rag_sentence_num 16 \
  --embedding_model Qwen/Qwen3-Embedding-0.6B

Run the batch evaluation script:

PYTHONPATH=. bash eval/eval.sh

πŸ“„ Citation

@article{nie2026evoembedding,
  title={EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory},
  author={Nie, Chang and Fu, Chaoyou and Feng, Junlan and Shan, Caifeng},
  journal={arXiv preprint},
  year={2026}
}

Releases

No releases published

Packages

 
 
 

Contributors