Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 13 additions & 16 deletions backend/app/api/simulation.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,33 +14,30 @@
from ..services.simulation_manager import SimulationManager, SimulationStatus
from ..services.simulation_runner import SimulationRunner, RunnerStatus
from ..utils.logger import get_logger
from ..utils.locale import t, get_locale, set_locale
from ..utils.locale import t, get_locale, set_locale, get_language_instruction
from ..models.project import ProjectManager

logger = get_logger('mirofish.api.simulation')


# Interview prompt 优化前缀
# 添加此前缀可以避免Agent调用工具,直接用文本回复
INTERVIEW_PROMPT_PREFIX = "结合你的人设、所有的过往记忆与行动,不调用任何工具直接用文本回复我:"
# Interview prompt prefix.
# This keeps interview calls in plain text and lets the requested locale drive the answer language.
INTERVIEW_PROMPT_PREFIX = (
"Answer directly in plain text. Use your persona, all prior memories, prior actions, "
"and the simulation context. Do not call tools. Do not return JSON. "
"Follow this language instruction: {language_instruction}\n\nQuestion: "
)


def optimize_interview_prompt(prompt: str) -> str:
"""
优化Interview提问,添加前缀避免Agent调用工具

Args:
prompt: 原始提问

Returns:
优化后的提问
"""
"""Add a plain-text interview prefix so agents answer instead of calling tools."""
if not prompt:
return prompt
# 避免重复添加前缀
if prompt.startswith(INTERVIEW_PROMPT_PREFIX):
if prompt.startswith("Answer directly in plain text."):
return prompt
return f"{INTERVIEW_PROMPT_PREFIX}{prompt}"
return INTERVIEW_PROMPT_PREFIX.format(
language_instruction=get_language_instruction()
) + prompt


# ============== 实体读取接口 ==============
Expand Down
192 changes: 44 additions & 148 deletions backend/app/services/ontology_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,149 +27,45 @@ def _to_pascal_case(name: str) -> str:


# 本体生成的系统提示词
ONTOLOGY_SYSTEM_PROMPT = """你是一个专业的知识图谱本体设计专家。你的任务是分析给定的文本内容和模拟需求,设计适合**社交媒体舆论模拟**的实体类型和关系类型。
ONTOLOGY_SYSTEM_PROMPT = """You are a knowledge-graph ontology designer for social simulation. Analyze the uploaded material and the simulation requirement, then design entity types and relationship types that fit a social-media-style multi-agent simulation.

**重要:你必须输出有效的JSON格式数据,不要输出任何其他内容。**
You must return valid JSON only. Do not include Markdown, comments, or explanatory text outside JSON.

## 核心任务背景
Context: every entity type should represent an actor that can speak, react, influence others, or be represented by an account in the simulation. Suitable entities include people, roles, companies, institutions, regulators, media organizations, platforms, professional groups, client groups, and other real-world actor categories. Avoid abstract concepts, topics, emotions, and attitudes as entity types.

我们正在构建一个**社交媒体舆论模拟系统**。在这个系统中:
- 每个实体都是一个可以在社交媒体上发声、互动、传播信息的"账号"或"主体"
- 实体之间会相互影响、转发、评论、回应
- 我们需要模拟舆论事件中各方的反应和信息传播路径

因此,**实体必须是现实中真实存在的、可以在社媒上发声和互动的主体**:

**可以是**:
- 具体的个人(公众人物、当事人、意见领袖、专家学者、普通人)
- 公司、企业(包括其官方账号)
- 组织机构(大学、协会、NGO、工会等)
- 政府部门、监管机构
- 媒体机构(报纸、电视台、自媒体、网站)
- 社交媒体平台本身
- 特定群体代表(如校友会、粉丝团、维权群体等)

**不可以是**:
- 抽象概念(如"舆论"、"情绪"、"趋势")
- 主题/话题(如"学术诚信"、"教育改革")
- 观点/态度(如"支持方"、"反对方")

## 输出格式

请输出JSON格式,包含以下结构:

```json
Output schema:
{
"entity_types": [
{
"name": "实体类型名称(英文,PascalCase)",
"description": "简短描述(英文,不超过100字符)",
"attributes": [
{
"name": "属性名(英文,snake_case)",
"type": "text",
"description": "属性描述"
}
],
"examples": ["示例实体1", "示例实体2"]
}
],
"edge_types": [
{
"name": "关系类型名称(英文,UPPER_SNAKE_CASE)",
"description": "简短描述(英文,不超过100字符)",
"source_targets": [
{"source": "源实体类型", "target": "目标实体类型"}
],
"attributes": []
}
],
"analysis_summary": "对文本内容的简要分析说明"
"entity_types": [
{
"name": "EntityTypeNameInEnglishPascalCase",
"description": "Short description in the target language",
"attributes": [
{"name": "attribute_name", "type": "text", "description": "Attribute description in the target language"}
],
"examples": ["Example 1", "Example 2"]
}
],
"edge_types": [
{
"name": "RELATIONSHIP_TYPE_IN_UPPER_SNAKE_CASE",
"description": "Short description in the target language",
"source_targets": [{"source": "SourceEntityType", "target": "TargetEntityType"}],
"attributes": []
}
],
"analysis_summary": "Brief analysis in the target language"
}
```

## 设计指南(极其重要!)

### 1. 实体类型设计 - 必须严格遵守

**数量要求:必须正好10个实体类型**

**层次结构要求(必须同时包含具体类型和兜底类型)**:

你的10个实体类型必须包含以下层次:

A. **兜底类型(必须包含,放在列表最后2个)**:
- `Person`: 任何自然人个体的兜底类型。当一个人不属于其他更具体的人物类型时,归入此类。
- `Organization`: 任何组织机构的兜底类型。当一个组织不属于其他更具体的组织类型时,归入此类。

B. **具体类型(8个,根据文本内容设计)**:
- 针对文本中出现的主要角色,设计更具体的类型
- 例如:如果文本涉及学术事件,可以有 `Student`, `Professor`, `University`
- 例如:如果文本涉及商业事件,可以有 `Company`, `CEO`, `Employee`

**为什么需要兜底类型**:
- 文本中会出现各种人物,如"中小学教师"、"路人甲"、"某位网友"
- 如果没有专门的类型匹配,他们应该被归入 `Person`
- 同理,小型组织、临时团体等应该归入 `Organization`

**具体类型的设计原则**:
- 从文本中识别出高频出现或关键的角色类型
- 每个具体类型应该有明确的边界,避免重叠
- description 必须清晰说明这个类型和兜底类型的区别

### 2. 关系类型设计

- 数量:6-10个
- 关系应该反映社媒互动中的真实联系
- 确保关系的 source_targets 涵盖你定义的实体类型

### 3. 属性设计

- 每个实体类型1-3个关键属性
- **注意**:属性名不能使用 `name`、`uuid`、`group_id`、`created_at`、`summary`(这些是系统保留字)
- 推荐使用:`full_name`, `title`, `role`, `position`, `location`, `description` 等

## 实体类型参考

**个人类(具体)**:
- Student: 学生
- Professor: 教授/学者
- Journalist: 记者
- Celebrity: 明星/网红
- Executive: 高管
- Official: 政府官员
- Lawyer: 律师
- Doctor: 医生

**个人类(兜底)**:
- Person: 任何自然人(不属于上述具体类型时使用)

**组织类(具体)**:
- University: 高校
- Company: 公司企业
- GovernmentAgency: 政府机构
- MediaOutlet: 媒体机构
- Hospital: 医院
- School: 中小学
- NGO: 非政府组织

**组织类(兜底)**:
- Organization: 任何组织机构(不属于上述具体类型时使用)

## 关系类型参考

- WORKS_FOR: 工作于
- STUDIES_AT: 就读于
- AFFILIATED_WITH: 隶属于
- REPRESENTS: 代表
- REGULATES: 监管
- REPORTS_ON: 报道
- COMMENTS_ON: 评论
- RESPONDS_TO: 回应
- SUPPORTS: 支持
- OPPOSES: 反对
- COLLABORATES_WITH: 合作
- COMPETES_WITH: 竞争
Design rules:
- Create exactly 10 entity types.
- The last two entity types must be Person and Organization as fallback types.
- Create 8 specific entity types that fit the uploaded material.
- Create 6 to 10 relationship types.
- Entity type names must be English PascalCase.
- Relationship names must be English UPPER_SNAKE_CASE.
- Attribute names must be English snake_case.
- Do not use reserved attribute names: name, uuid, group_id, created_at, summary.
- Natural-language descriptions, examples where possible, and analysis_summary must follow the requested response language.
"""


Expand Down Expand Up @@ -245,31 +141,31 @@ def _build_user_message(
combined_text = combined_text[:self.MAX_TEXT_LENGTH_FOR_LLM]
combined_text += f"\n\n...(原文共{original_length}字,已截取前{self.MAX_TEXT_LENGTH_FOR_LLM}字用于本体分析)..."

message = f"""## 模拟需求
message = f"""## Simulation requirement

{simulation_requirement}

## 文档内容
## Uploaded material

{combined_text}
"""

if additional_context:
message += f"""
## 额外说明
## Additional context

{additional_context}
"""

message += """
请根据以上内容,设计适合社会舆论模拟的实体类型和关系类型。

**必须遵守的规则**:
1. 必须正好输出10个实体类型
2. 最后2个必须是兜底类型:Person(个人兜底)和 Organization(组织兜底)
3. 前8个是根据文本内容设计的具体类型
4. 所有实体类型必须是现实中可以发声的主体,不能是抽象概念
5. 属性名不能使用 nameuuidgroup_id 等保留字,用 full_name、org_name 等替代
Based on the material above, design entity and relationship types for the simulation. Return JSON only.

Mandatory rules:
1. Output exactly 10 entity types.
2. The last 2 entity types must be fallback types: Person and Organization.
3. The first 8 entity types must be specific to the provided material.
4. Every entity type must be a real-world actor that can plausibly speak or act, not an abstract concept.
5. Do not use reserved attribute names such as name, uuid, or group_id. Use alternatives such as full_name or org_name.
"""

return message
Expand Down
Loading