OpenAI API-compatible serving built with FastAPI + gRPC.
LarkMemoryCore 是一个面向飞书项目决策记忆和真实模型后端的 OpenAI API 兼容推理服务,采用 FastAPI + gRPC 双进程架构。
Commercial identity for the RuyiAI-Stack launch:
- Organization:
RuyiAI-Stack - Public site:
https://ruyiai-stack.github.io - Public repository:
RuyiAI-Stack/ruyiai-stack.github.io
- the official build and test entrypoint is root
CMake + Ninja + CTest - Python API unit tests are part of the default CTest graph
- real model validation stays explicit and is not hidden behind unit-test targets
- deployment scripts assume a target Linux host with
systemd --user
If you are changing this repo, expect to run explicit configure, proto generation, build, unit tests, and real-model validation steps.
如果你要修改这个仓库,默认就应该显式执行 configure、protobuf 生成、构建、单元测试,以及真实模型验证,而不是依赖一条“自动到底”的命令。
compute_server/: C++ gRPC backendapi_server/: FastAPI API layerproto/: shared protobuf schemaapi_server/proto/: generated Python protobuf stubstests/python/: API unit and contract teststests/integration_real/: real-model integration testsbenchmarks/: real-data benchmark entrypointsexamples/: SDK and smoke examplesops/: deployment and host orchestration scriptsdocs/: deployment and validation documentationcompetition/feishu_office/: competition dataset, training, evaluation, and tuned-model runtime assets
The Feishu Office Assistant competition delivery is implemented in
competition/feishu_office/ and the companion docs:
docs/competition-feishu-office-dataset.mddocs/competition-feishu-office-training-report.mddocs/competition-feishu-office-effect-report.mddocs/competition-feishu-office-demo.mddocs/memory-definition-architecture-whitepaper.mddocs/memory-benchmark-report.md
Competition runtime helpers:
./ops/feishu_office_train_env.sh./ops/feishu_office_competition_preflight.sh./ops/feishu_office_competition_start.sh./ops/feishu_office_competition_stop.sh./ops/feishu_office_competition_start_kimi.sh— slim launcher when onlymoonshot/kimi-k2.5is needed (skips the HuggingFace adapter daemon).
Exact end-to-end reproduction:
docs/competition-feishu-office-reproduction.md
ops/feishu_bridge.py is a long-connection (WebSocket) bridge that lets a
Feishu / Lark bot talk to LarkMemoryCore. It receives im.message.receive_v1
events from the Feishu Open Platform via lark-oapi, calls
/v1/chat/completions with project-memory metadata, and replies in the same
chat. No public webhook URL is required.
Feishu user ──(WS)──▶ feishu_bridge.py ──(HTTP)──▶ LarkMemoryCore /v1/chat/completions
│ │
│ └─▶ moonshot/kimi-k2.5
│ via ops/openclaw_kimi_cli.py
└─(reply)── Feishu im.v1.message.reply
- Single chat (
p2p): every message is treated as a conversation with the bot and gets a reply. - Group chat: the bridge inspects
message.mentions[*].id.open_idand only replies when the bot's ownopen_idis mentioned. Non-mention messages are still recorded with anobserve-onlylog line, which is useful when the app is grantedim:message.group_msg(:readonly)so the bot needs to read every message but only respond when explicitly addressed. - Messages are processed on a 4-thread worker pool so the WebSocket handler
acks each event in milliseconds. A 10-minute / 500-entry sliding-window
dedup map drops Feishu retries with the same
message_id.
Apply on the Feishu Open Platform under Permission Management, then publish a new app version (permissions do not take effect until a version is approved):
| Permission | Purpose |
|---|---|
im:message |
Base IM capability |
im:message:send_as_bot |
Reply as the bot |
im:message.group_at_msg (+ :readonly) |
Receive @bot events in groups |
im:message.p2p_msg (+ :readonly) |
Receive 1:1 messages |
im:message.group_msg.readonly |
(Optional) read every group message; required only if you want the bridge to log non-@ messages too |
Then under Event Subscription select Use long connection to receive
events and add the im.message.receive_v1 event.
Copy the template and fill in real credentials:
cp ops/feishu_bridge.env.example ops/feishu_bridge.env
chmod 600 ops/feishu_bridge.envRequired fields:
| Variable | Description |
|---|---|
FEISHU_APP_ID, FEISHU_APP_SECRET |
App credentials from the Open Platform. |
LARK_MEMORY_CORE_API_KEY |
Same key used by the API server (.run/feishu-office-competition/runtime/api_key.txt). |
LARK_MEMORY_CORE_BASE_URL |
Default http://127.0.0.1:18100. |
LARK_MEMORY_CORE_MODEL |
Default moonshot/kimi-k2.5. |
LARK_MEMORY_CORE_TENANT_ID, ..._PROJECT_ID |
Memory scope; must match the events injected via seed_memory_engine for hits to count. |
FEISHU_BOT_OPEN_ID |
Bot open_id, used to detect @bot in groups. Look up via GET https://open.feishu.cn/open-apis/bot/v3/info. |
FEISHU_VERIFY_TOKEN, FEISHU_ENCRYPT_KEY |
Only needed if enabled in the event subscription page. |
ops/feishu_bridge.env is gitignored — never commit real secrets. The
template ops/feishu_bridge.env.example is the canonical reference.
The bridge needs lark-oapi and requests in the runtime Python
environment:
python -m pip install lark-oapi requestsIn one shell, bring up LarkMemoryCore (slim, kimi-only path is enough for the bridge):
bash ops/feishu_office_competition_start_kimi.sh
curl -sS http://127.0.0.1:18100/readyIn a second shell, start the bridge:
set -a; source ops/feishu_bridge.env; set +a
python ops/feishu_bridge.pyLogs stream to stdout and to
.run/feishu-office-competition/logs/feishu_bridge.log. Background usage:
nohup bash -lc 'set -a; source ops/feishu_bridge.env; set +a; \
exec python ops/feishu_bridge.py' \
>> .run/feishu-office-competition/logs/feishu_bridge.stdout.log 2>&1 < /dev/null &
disown@-mention the bot in a Feishu group (or DM it) with any prompt; expect
three log lines:
INFO received chat_type=group mentioned=True chat_id=oc_... message_id=om_... chars=N preview=...
INFO memory_hits=K latency=Xs prompt_chars=N reply_chars=M
INFO reply ok message_id=om_...
A non-mention group message logs mentioned=False followed by
observe-only ... (no reply, not mentioned) and triggers no LLM call.
sudo apt-get update
sudo apt-get install -y \
build-essential \
cmake \
ninja-build \
pkg-config \
libgrpc++-dev \
libgrpc-dev \
libprotobuf-dev \
protobuf-compiler \
protobuf-compiler-grpc \
libssl-dev \
libgtest-dev \
nlohmann-json3-dev \
python3 \
python3-pipInstall Python dependencies with the same interpreter you will use for repo commands:
python3 -m pip install -r requirements-dev.txtRun all build commands from the repository root.
Linux:
cmake --preset linux-debugCI uses:
cmake --preset ci-linuxConfigure-time checks cover:
- Ninja
pkg-config- Python
>= 3.10 - Python modules
grpc_tools,pytest,requests - C++ dependencies discovered by the compute subproject during the same configure pass
The API layer imports generated files from api_server/proto/, so Python protobuf generation is an explicit build target:
cmake --build --preset linux-debug-build --target generate_python_protoThis target runs the following underlying command:
python3 -m grpc_tools.protoc \
-I proto \
--python_out=api_server/proto \
--grpc_python_out=api_server/proto \
proto/compute.protoGenerated files:
api_server/proto/compute_pb2.pyapi_server/proto/compute_pb2_grpc.py
Build the service binary and C++ test executables from the root graph:
cmake --build --preset linux-debug-build --target compute_server compute_server_testsCanonical runtime artifact:
build/bin/compute_server
Run C++ and Python unit tests through CTest:
ctest --preset linux-debug-test --output-on-failure --label-regex "cpp|python-unit"tests/python mainly validates API behavior and contracts. These tests do not require a real model backend.
- request and response contracts
- auth and rate limiting behavior
- readiness and API-side routing behavior
- API -> gRPC prompt/response contract checks
- docs contract checks
If you are changing FastAPI handlers or API behavior, run this suite before starting the API server manually.
Direct path:
python3 -m pytest -q tests/pythonFor a local source-tree run, prepare the runtime files manually:
cp config.example.env .env
cp models.json.example models.jsonThen replace every placeholder tool.cli_path with a real executable path.
If .env points MODELS_CONFIG_FILE to a custom location, that location becomes the active runtime config.
Build the required artifacts first:
cmake --build --preset linux-debug-build --target generate_python_proto compute_serverStart the compute server:
./build/bin/compute_serverStart the API server in a second shell:
python3 -m uvicorn api_server.main:app --host 127.0.0.1 --port 8000Useful endpoints:
GET /healthGET /readyGET /v1/modelsGET /v1/models/{model_id}GET /metricsGET /v1/admin/backendsPOST /v1/admin/reload-modelsPOST /v1/memory/eventsGET /v1/memory/searchGET /v1/memory/reportGET /v1/competition/feishu-office/evidence
Competition demo helpers:
python3 -m competition.feishu_office.seed_memory_engine \
--base-url http://127.0.0.1:18100 \
--api-key "$LARK_MEMORY_CORE_API_KEY"The evidence endpoint summarizes real dataset, evaluation, and Feishu acceptance artifacts. It intentionally omits model answer bodies.
For models using buddy_deepseek_r1, the API layer does not send the raw chat JSON payload to the backend. It first renders a plain-text prompt and sends that prompt through gRPC.
That means direct CLI / direct gRPC / API result comparisons must use the same final prompt string. For example, chat input hello! is compared against backend prompt User: hello!, not against raw stdin hello!.
Before treating lark-memory-core as validated against a real model backend, compile and direct-test the actual model CLI that the runtime will launch.
For the default Buddy DeepSeek R1 path:
cd /home/huangyiheng/buddy-mlir
ninja -C build buddy-deepseek-r1-cli
printf "Say READY only.\n" | ./build/bin/buddy-deepseek-r1-cli --max-tokens=4 --no-statsThen ensure the active models.json points tool.cli_path at that executable.
These gates require:
- a running
lark-memory-coreinstance - a real model binary
- a real dataset
- a client API key if auth is enabled
Example:
cd /home/huangyiheng/src/ruyi-serving-feishu-live-20260416
export REAL_INTEGRATION_MODEL="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
export REAL_DATASET_PATH="/home/huangyiheng/src/ruyi-serving-feishu-live-20260416/tests/real_data/huangyiheng_2026_02_real.jsonl"
export REAL_INTEGRATION_BASE_URL="http://127.0.0.1:18100"
export REAL_INTEGRATION_API_KEY="$(cat /home/huangyiheng/src/ruyi-serving-feishu-live-20260416/.run/feishu-office-competition/runtime/api_key.txt)"
export REAL_INTEGRATION_MAX_SAMPLES=1
export REAL_INTEGRATION_TIMEOUT_S=180Run the real integration tests:
pytest -q -m real_integration tests/integration_realRun the real benchmark:
python3 benchmarks/real_inference_benchmark.pyThese are explicit developer gates. They are not folded into the default unit-test graph because they depend on a live service, a real model, and a real dataset.
CI uses the same root build graph as developers:
cmake --preset linux-debug
cmake --build --preset linux-debug-build --target generate_python_proto compute_server compute_server_tests
ctest --preset linux-debug-test --output-on-failure --label-regex "cpp|python-unit"That keeps configure, build, and unit-test behavior aligned across local development and automation.
examples/openai_sdk_compat.pyexamples/javascript/openai_sdk_compat.mjsexamples/postman/lark-memory-core.postman_collection.json
- Contributing: CONTRIBUTING.md
- Code of Conduct: CODE_OF_CONDUCT.md
- Security Policy: SECURITY.md
- License: Apache-2.0