ai-robot-core/spec/ai-service/iterations/v0.8.0-intent-hybrid-routing/design.md

35 KiB
Raw Blame History

feature_id iteration_id title status version created_at inputs
AISVC v0.8.0-intent-hybrid-routing 意图识别混合路由优化 - 技术设计 draft 0.8.0 2026-03-08
spec/ai-service/iterations/v0.8.0-intent-hybrid-routing/requirements.md
spec/ai-service/iterations/v0.8.0-intent-hybrid-routing/scope.md

意图识别混合路由优化 - Designv0.8.0

1. 设计目标与约束

1.1 设计目标

  • 将意图识别从"单一规则匹配"升级为"规则+语义+LLM"三路混合路由
  • 提升意图识别召回率与准确率
  • 提供置信度评分与路由追踪日志
  • 最小侵入:仅在 Step 3 插入混合路由,不改主链路

1.2 硬约束

  • 现有规则引擎继续可用,作为混合路由的一路输入
  • /ai/chat 对外响应语义不变
  • 全部新增逻辑必须 tenantId 隔离
  • 保留 IntentRouter.match() 方法向后兼容

2. 架构设计

2.1 最小侵入架构图

┌─────────────────────────────────────────────────────────────────────────────┐
│                        Orchestrator 12-Step Pipeline                         │
│                                                                              │
│  Step 1: InputScanner → Step 2: FlowEngine → Step 3: IntentRouter [改造]    │
│                                                      │                       │
│                                                      ▼                       │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                    IntentRouter (Hybrid Routing)                         ││
│  │                                                                          ││
│  │  ┌─────────────────────────────────────────────────────────────────────┐││
│  │  │                    Parallel Matching Layer                           │││
│  │  │                                                                      │││
│  │  │   ┌───────────────┐   ┌───────────────┐   ┌───────────────┐        │││
│  │  │   │ RuleMatcher   │   │SemanticMatcher│   │   LlmJudge    │        │││
│  │  │   │ (现有+score)  │   │   (新增)      │   │  (条件触发)   │        │││
│  │  │   │               │   │               │   │               │        │││
│  │  │   │ keywords      │   │ embedding     │   │ LLM call      │        │││
│  │  │   │ regex         │   │ similarity    │   │ arbitration   │        │││
│  │  │   │ score: 0|1    │   │ score: 0~1    │   │ score: 0~1    │        │││
│  │  │   └───────┬───────┘   └───────┬───────┘   └───────┬───────┘        │││
│  │  │           │                   │                   │                │││
│  │  │           └───────────────────┼───────────────────┘                │││
│  │  │                               ▼                                    │││
│  │  └─────────────────────────────────────────────────────────────────────┘││
│  │                                  │                                      ││
│  │                                  ▼                                      ││
│  │  ┌─────────────────────────────────────────────────────────────────────┐││
│  │  │                    FusionPolicy (新增)                               │││
│  │  │                                                                      │││
│  │  │   输入: rule_result, semantic_result, llm_result                     │││
│  │  │   处理: 加权融合 + 冲突检测 + 阈值判定                                │││
│  │  │   输出: final_intent, final_confidence, decision_reason, trace       │││
│  │  │                                                                      │││
│  │  └─────────────────────────────────────────────────────────────────────┘││
│  │                                                                          ││
│  └─────────────────────────────────────────────────────────────────────────┘│
│                                    │                                         │
│                                    ▼                                         │
│                         response_type 路由(不变)                           │
│                     fixed / rag / flow / transfer                            │
└─────────────────────────────────────────────────────────────────────────────┘

2.2 插入点标注

插入点 位置 改造内容
Step 3 入口 orchestrator.py:500 调用 IntentRouter.match_hybrid() 替代 match()
IntentRule 实体 entities.py:420-463 新增 intent_vectorsemantic_examples 字段
IntentRouter 类 intent/router.py 新增 match_hybrid() 方法,保留 match() 向后兼容
新增模块 intent/ 新增 semantic_matcher.pyllm_judge.pyfusion_policy.pymodels.py

3. 核心接口设计

3.1 数据模型

# 位置: app/services/intent/models.py

from dataclasses import dataclass, field
from typing import Any
import uuid

@dataclass
class RuleMatchResult:
    """规则匹配结果"""
    rule_id: uuid.UUID | None
    rule: "IntentRule | None"
    match_type: str | None      # "keyword" | "regex" | None
    matched_text: str | None
    score: float                # 1.0 或 0.0
    duration_ms: int

@dataclass
class SemanticCandidate:
    """语义匹配候选"""
    rule: "IntentRule"
    score: float                # 0.0 ~ 1.0 相似度

@dataclass
class SemanticMatchResult:
    """语义匹配结果"""
    candidates: list[SemanticCandidate]  # Top-N 候选
    top_score: float
    duration_ms: int
    skipped: bool               # 是否跳过(无语义向量配置)
    skip_reason: str | None     # 跳过原因

@dataclass
class LlmJudgeInput:
    """LLM 仲裁输入"""
    message: str
    candidates: list[dict]      # 候选意图列表
    conflict_type: str          # "rule_semantic_conflict" | "gray_zone" | "multi_intent"

@dataclass
class LlmJudgeResult:
    """LLM 仲裁结果"""
    intent_id: str | None
    intent_name: str | None
    score: float                # 0.0 ~ 1.0
    reasoning: str | None       # LLM 的推理过程
    duration_ms: int
    tokens_used: int
    triggered: bool

@dataclass
class FusionConfig:
    """融合配置"""
    w_rule: float = 0.5
    w_semantic: float = 0.3
    w_llm: float = 0.2
    semantic_threshold: float = 0.7
    conflict_threshold: float = 0.2
    gray_zone_threshold: float = 0.6
    min_trigger_threshold: float = 0.3
    clarify_threshold: float = 0.4
    multi_intent_threshold: float = 0.15
    llm_judge_enabled: bool = True
    semantic_matcher_enabled: bool = True

@dataclass
class RouteTrace:
    """路由追踪日志"""
    rule_match: dict = field(default_factory=dict)
    semantic_match: dict = field(default_factory=dict)
    llm_judge: dict = field(default_factory=dict)
    fusion: dict = field(default_factory=dict)

@dataclass
class FusionResult:
    """融合决策结果"""
    final_intent: "IntentRule | None"
    final_confidence: float
    decision_reason: str
    need_clarify: bool
    clarify_candidates: list["IntentRule"] | None
    trace: RouteTrace

3.2 RuleMatcher改造现有 IntentRouter

# 位置: app/services/intent/router.py

class RuleMatcher:
    """规则匹配器(基于现有 IntentRouter"""
    
    def match(self, message: str, rules: list[IntentRule]) -> RuleMatchResult:
        """
        关键词+正则匹配
        
        匹配算法:
        1. 按 priority 降序遍历规则
        2. 对每条规则,先尝试关键词匹配
        3. 若无关键词匹配,尝试正则模式匹配
        4. 返回第一个匹配(最高优先级)
        
        Args:
            message: 用户消息
            rules: 规则列表(已按优先级降序排列)
            
        Returns:
            RuleMatchResult: 匹配结果
        """
        start_time = time.time()
        message_lower = message.lower()
        
        for rule in rules:
            if not rule.is_enabled:
                continue
            
            result = self._match_keywords(message, message_lower, rule)
            if result:
                duration_ms = int((time.time() - start_time) * 1000)
                return RuleMatchResult(
                    rule_id=rule.id,
                    rule=rule,
                    match_type="keyword",
                    matched_text=result.matched,
                    score=1.0,
                    duration_ms=duration_ms
                )
            
            result = self._match_patterns(message, rule)
            if result:
                duration_ms = int((time.time() - start_time) * 1000)
                return RuleMatchResult(
                    rule_id=rule.id,
                    rule=rule,
                    match_type="regex",
                    matched_text=result.matched,
                    score=1.0,
                    duration_ms=duration_ms
                )
        
        duration_ms = int((time.time() - start_time) * 1000)
        return RuleMatchResult(
            rule_id=None,
            rule=None,
            match_type=None,
            matched_text=None,
            score=0.0,
            duration_ms=duration_ms
        )
    
    def _match_keywords(self, message: str, message_lower: str, rule: IntentRule) -> IntentMatchResult | None:
        """关键词匹配(保留现有逻辑)"""
        pass
    
    def _match_patterns(self, message: str, rule: IntentRule) -> IntentMatchResult | None:
        """正则匹配(保留现有逻辑)"""
        pass

3.3 SemanticMatcher新增

# 位置: app/services/intent/semantic_matcher.py

import asyncio
from typing import Any
import numpy as np

class SemanticMatcher:
    """语义匹配器"""
    
    def __init__(
        self,
        embedding_provider: EmbeddingProvider,
        config: FusionConfig
    ):
        self._embedding_provider = embedding_provider
        self._config = config
    
    async def match(
        self, 
        message: str, 
        rules: list[IntentRule],
        tenant_id: str,
        top_k: int = 3
    ) -> SemanticMatchResult:
        """
        向量语义匹配
        
        匹配模式:
        - 模式 A: 使用规则预置的 intent_vector 直接计算相似度
        - 模式 B: 使用规则的 semantic_examples 动态计算平均相似度
        
        Args:
            message: 用户消息
            rules: 规则列表
            tenant_id: 租户 ID
            top_k: 返回候选数
            
        Returns:
            SemanticMatchResult: 匹配结果
        """
        start_time = time.time()
        
        if not self._config.semantic_matcher_enabled:
            return SemanticMatchResult(
                candidates=[],
                top_score=0.0,
                duration_ms=0,
                skipped=True,
                skip_reason="disabled"
            )
        
        rules_with_semantic = [r for r in rules if self._has_semantic_config(r)]
        if not rules_with_semantic:
            duration_ms = int((time.time() - start_time) * 1000)
            return SemanticMatchResult(
                candidates=[],
                top_score=0.0,
                duration_ms=duration_ms,
                skipped=True,
                skip_reason="no_semantic_config"
            )
        
        try:
            message_vector = await asyncio.wait_for(
                self._embedding_provider.embed(message),
                timeout=self._config.semantic_matcher_timeout_ms / 1000
            )
        except asyncio.TimeoutError:
            duration_ms = int((time.time() - start_time) * 1000)
            return SemanticMatchResult(
                candidates=[],
                top_score=0.0,
                duration_ms=duration_ms,
                skipped=True,
                skip_reason="embedding_timeout"
            )
        except Exception as e:
            duration_ms = int((time.time() - start_time) * 1000)
            return SemanticMatchResult(
                candidates=[],
                top_score=0.0,
                duration_ms=duration_ms,
                skipped=True,
                skip_reason=f"embedding_error: {str(e)}"
            )
        
        candidates = []
        for rule in rules_with_semantic:
            score = await self._calculate_similarity(message_vector, rule)
            if score > 0:
                candidates.append(SemanticCandidate(rule=rule, score=score))
        
        candidates.sort(key=lambda x: x.score, reverse=True)
        candidates = candidates[:top_k]
        
        duration_ms = int((time.time() - start_time) * 1000)
        return SemanticMatchResult(
            candidates=candidates,
            top_score=candidates[0].score if candidates else 0.0,
            duration_ms=duration_ms,
            skipped=False,
            skip_reason=None
        )
    
    def _has_semantic_config(self, rule: IntentRule) -> bool:
        """检查规则是否有语义配置"""
        return bool(rule.intent_vector) or bool(rule.semantic_examples)
    
    async def _calculate_similarity(self, message_vector: list[float], rule: IntentRule) -> float:
        """计算相似度"""
        if rule.intent_vector:
            return self._cosine_similarity(message_vector, rule.intent_vector)
        elif rule.semantic_examples:
            example_vectors = await self._embedding_provider.embed_batch(rule.semantic_examples)
            similarities = [
                self._cosine_similarity(message_vector, v)
                for v in example_vectors
            ]
            return max(similarities) if similarities else 0.0
        return 0.0
    
    def _cosine_similarity(self, v1: list[float], v2: list[float]) -> float:
        """计算余弦相似度"""
        v1_arr = np.array(v1)
        v2_arr = np.array(v2)
        return float(np.dot(v1_arr, v2_arr) / (np.linalg.norm(v1_arr) * np.linalg.norm(v2_arr)))

3.4 LlmJudge新增

# 位置: app/services/intent/llm_judge.py

class LlmJudge:
    """LLM 仲裁器"""
    
    JUDGE_PROMPT = """你是一个意图识别仲裁器。根据用户消息和候选意图,判断最匹配的意图。

用户消息:{message}

候选意图:
{candidates}

请返回 JSON 格式:
{{
  "intent_id": "最匹配的意图ID",
  "intent_name": "意图名称",
  "confidence": 0.0-1.0之间的置信度,
  "reasoning": "判断理由"
}}
"""
    
    def __init__(
        self,
        llm_client: LLMClient,
        config: FusionConfig
    ):
        self._llm_client = llm_client
        self._config = config
    
    def should_trigger(
        self,
        rule_result: RuleMatchResult,
        semantic_result: SemanticMatchResult,
        config: FusionConfig
    ) -> tuple[bool, str]:
        """
        判断是否触发 LLM Judge
        
        触发条件:
        1. 冲突场景RuleMatcher 与 SemanticMatcher 命中不同意图
        2. 灰区场景:最高置信度在灰区范围内
        3. 多意图场景:多个候选意图置信度接近
        
        Args:
            rule_result: 规则匹配结果
            semantic_result: 语义匹配结果
            config: 融合配置
            
        Returns:
            (是否触发, 触发原因)
        """
        if not config.llm_judge_enabled:
            return False, "disabled"
        
        rule_score = rule_result.score
        semantic_score = semantic_result.top_score
        
        if rule_score > 0 and semantic_score > 0:
            if rule_result.rule_id != semantic_result.candidates[0].rule.id:
                if abs(rule_score - semantic_score) < config.conflict_threshold:
                    return True, "rule_semantic_conflict"
        
        max_score = max(rule_score, semantic_score)
        if config.min_trigger_threshold < max_score < config.gray_zone_threshold:
            return True, "gray_zone"
        
        if len(semantic_result.candidates) >= 2:
            top1_score = semantic_result.candidates[0].score
            top2_score = semantic_result.candidates[1].score
            if abs(top1_score - top2_score) < config.multi_intent_threshold:
                return True, "multi_intent"
        
        return False, ""
    
    async def judge(
        self, 
        input: LlmJudgeInput,
        tenant_id: str
    ) -> LlmJudgeResult:
        """
        LLM 仲裁
        
        Args:
            input: 仲裁输入
            tenant_id: 租户 ID
            
        Returns:
            LlmJudgeResult: 仲裁结果
        """
        start_time = time.time()
        
        candidates_text = "\n".join([
            f"- ID: {c['id']}, 名称: {c['name']}, 描述: {c.get('description', 'N/A')}"
            for c in input.candidates
        ])
        
        prompt = self.JUDGE_PROMPT.format(
            message=input.message,
            candidates=candidates_text
        )
        
        try:
            response = await asyncio.wait_for(
                self._llm_client.generate(
                    messages=[{"role": "user", "content": prompt}],
                    max_tokens=200,
                    temperature=0
                ),
                timeout=self._config.llm_judge_timeout_ms / 1000
            )
            
            result = self._parse_response(response.content)
            duration_ms = int((time.time() - start_time) * 1000)
            
            return LlmJudgeResult(
                intent_id=result.get("intent_id"),
                intent_name=result.get("intent_name"),
                score=result.get("confidence", 0.5),
                reasoning=result.get("reasoning"),
                duration_ms=duration_ms,
                tokens_used=response.total_tokens,
                triggered=True
            )
            
        except asyncio.TimeoutError:
            duration_ms = int((time.time() - start_time) * 1000)
            return LlmJudgeResult(
                intent_id=None,
                intent_name=None,
                score=0.0,
                reasoning="LLM timeout",
                duration_ms=duration_ms,
                tokens_used=0,
                triggered=True
            )
        except Exception as e:
            duration_ms = int((time.time() - start_time) * 1000)
            return LlmJudgeResult(
                intent_id=None,
                intent_name=None,
                score=0.0,
                reasoning=f"LLM error: {str(e)}",
                duration_ms=duration_ms,
                tokens_used=0,
                triggered=True
            )
    
    def _parse_response(self, content: str) -> dict:
        """解析 LLM 响应"""
        import json
        try:
            return json.loads(content)
        except json.JSONDecodeError:
            return {}

3.5 FusionPolicy新增

# 位置: app/services/intent/fusion_policy.py

class FusionPolicy:
    """融合决策策略"""
    
    DECISION_PRIORITY = [
        ("rule_high_confidence", lambda r, s, l: r.score == 1.0 and r.rule is not None),
        ("llm_judge", lambda r, s, l: l.triggered and l.intent_id is not None),
        ("semantic_override", lambda r, s, l: r.score == 0 and s.top_score > 0.7),
        ("rule_semantic_agree", lambda r, s, l: r.score > 0 and s.top_score > 0.5 and r.rule_id == s.candidates[0].rule.id if s.candidates else False),
        ("semantic_fallback", lambda r, s, l: s.top_score > 0.5),
        ("rule_fallback", lambda r, s, l: r.score > 0),
        ("no_match", lambda r, s, l: True),
    ]
    
    def __init__(self, config: FusionConfig):
        self._config = config
    
    def fuse(
        self,
        rule_result: RuleMatchResult,
        semantic_result: SemanticMatchResult,
        llm_result: LlmJudgeResult | None
    ) -> FusionResult:
        """
        融合决策
        
        Args:
            rule_result: 规则匹配结果
            semantic_result: 语义匹配结果
            llm_result: LLM 仲裁结果(可能为 None
            
        Returns:
            FusionResult: 融合结果
        """
        trace = RouteTrace(
            rule_match={
                "rule_id": str(rule_result.rule_id) if rule_result.rule_id else None,
                "match_type": rule_result.match_type,
                "matched_text": rule_result.matched_text,
                "score": rule_result.score,
                "duration_ms": rule_result.duration_ms
            },
            semantic_match={
                "top_candidates": [
                    {"rule_id": str(c.rule.id), "name": c.rule.name, "score": c.score}
                    for c in semantic_result.candidates
                ],
                "top_score": semantic_result.top_score,
                "duration_ms": semantic_result.duration_ms,
                "skipped": semantic_result.skipped,
                "skip_reason": semantic_result.skip_reason
            },
            llm_judge={
                "triggered": llm_result.triggered if llm_result else False,
                "intent_id": llm_result.intent_id if llm_result else None,
                "score": llm_result.score if llm_result else 0.0,
                "duration_ms": llm_result.duration_ms if llm_result else 0,
                "tokens_used": llm_result.tokens_used if llm_result else 0
            },
            fusion={}
        )
        
        final_intent = None
        final_confidence = 0.0
        decision_reason = "no_match"
        
        for reason, condition in self.DECISION_PRIORITY:
            if condition(rule_result, semantic_result, llm_result or LlmJudgeResult.empty()):
                decision_reason = reason
                break
        
        if decision_reason == "rule_high_confidence":
            final_intent = rule_result.rule
            final_confidence = 1.0
        elif decision_reason == "llm_judge" and llm_result:
            final_intent = self._find_rule_by_id(llm_result.intent_id, rule_result, semantic_result)
            final_confidence = llm_result.score
        elif decision_reason == "semantic_override":
            final_intent = semantic_result.candidates[0].rule
            final_confidence = semantic_result.top_score
        elif decision_reason == "rule_semantic_agree":
            final_intent = rule_result.rule
            final_confidence = self._calculate_weighted_confidence(rule_result, semantic_result, llm_result)
        elif decision_reason == "semantic_fallback":
            final_intent = semantic_result.candidates[0].rule
            final_confidence = semantic_result.top_score
        elif decision_reason == "rule_fallback":
            final_intent = rule_result.rule
            final_confidence = rule_result.score
        
        need_clarify = final_confidence < self._config.clarify_threshold
        clarify_candidates = None
        if need_clarify and len(semantic_result.candidates) > 1:
            clarify_candidates = [c.rule for c in semantic_result.candidates[:3]]
        
        trace.fusion = {
            "weights": {
                "w_rule": self._config.w_rule,
                "w_semantic": self._config.w_semantic,
                "w_llm": self._config.w_llm
            },
            "final_confidence": final_confidence,
            "decision_reason": decision_reason
        }
        
        return FusionResult(
            final_intent=final_intent,
            final_confidence=final_confidence,
            decision_reason=decision_reason,
            need_clarify=need_clarify,
            clarify_candidates=clarify_candidates,
            trace=trace
        )
    
    def _calculate_weighted_confidence(
        self,
        rule_result: RuleMatchResult,
        semantic_result: SemanticMatchResult,
        llm_result: LlmJudgeResult | None
    ) -> float:
        """计算加权置信度"""
        rule_score = rule_result.score
        semantic_score = semantic_result.top_score if not semantic_result.skipped else 0.0
        llm_score = llm_result.score if llm_result and llm_result.triggered else 0.0
        
        total_weight = self._config.w_rule + self._config.w_semantic
        if llm_result and llm_result.triggered:
            total_weight += self._config.w_llm
        
        confidence = (
            self._config.w_rule * rule_score +
            self._config.w_semantic * semantic_score +
            self._config.w_llm * llm_score
        ) / total_weight
        
        return min(1.0, max(0.0, confidence))
    
    def _find_rule_by_id(
        self,
        intent_id: str | None,
        rule_result: RuleMatchResult,
        semantic_result: SemanticMatchResult
    ) -> IntentRule | None:
        """根据 ID 查找规则"""
        if not intent_id:
            return None
        
        if rule_result.rule_id and str(rule_result.rule_id) == intent_id:
            return rule_result.rule
        
        for candidate in semantic_result.candidates:
            if str(candidate.rule.id) == intent_id:
                return candidate.rule
        
        return None

3.6 IntentRouter 升级

# 位置: app/services/intent/router.py

class IntentRouter:
    """意图路由器(升级版)"""
    
    def __init__(
        self,
        rule_matcher: RuleMatcher,
        semantic_matcher: SemanticMatcher,
        llm_judge: LlmJudge,
        fusion_policy: FusionPolicy,
        config: FusionConfig | None = None
    ):
        self._rule_matcher = rule_matcher
        self._semantic_matcher = semantic_matcher
        self._llm_judge = llm_judge
        self._fusion_policy = fusion_policy
        self._config = config or FusionConfig()
    
    async def match_hybrid(
        self,
        message: str,
        rules: list[IntentRule],
        tenant_id: str,
        config: FusionConfig | None = None
    ) -> FusionResult:
        """
        混合路由入口
        
        流程:
        1. 并行执行 RuleMatcher + SemanticMatcher
        2. 判断是否触发 LlmJudge
        3. 执行 FusionPolicy
        4. 返回融合结果
        
        Args:
            message: 用户消息
            rules: 规则列表
            tenant_id: 租户 ID
            config: 融合配置(可选,覆盖默认配置)
            
        Returns:
            FusionResult: 融合结果
        """
        effective_config = config or self._config
        
        rule_result, semantic_result = await asyncio.gather(
            asyncio.to_thread(self._rule_matcher.match, message, rules),
            self._semantic_matcher.match(message, rules, tenant_id)
        )
        
        llm_result = None
        should_trigger, trigger_reason = self._llm_judge.should_trigger(
            rule_result, semantic_result, effective_config
        )
        
        if should_trigger:
            candidates = self._build_llm_candidates(rule_result, semantic_result)
            llm_result = await self._llm_judge.judge(
                LlmJudgeInput(
                    message=message,
                    candidates=candidates,
                    conflict_type=trigger_reason
                ),
                tenant_id
            )
        
        fusion_result = self._fusion_policy.fuse(
            rule_result, semantic_result, llm_result
        )
        
        return fusion_result
    
    def match(self, message: str, rules: list[IntentRule]) -> IntentMatchResult | None:
        """
        原有方法保留,向后兼容
        
        Args:
            message: 用户消息
            rules: 规则列表
            
        Returns:
            IntentMatchResult | None: 匹配结果
        """
        result = self._rule_matcher.match(message, rules)
        if result.rule:
            return IntentMatchResult(
                rule=result.rule,
                match_type=result.match_type,
                matched=result.matched_text
            )
        return None
    
    def _build_llm_candidates(
        self,
        rule_result: RuleMatchResult,
        semantic_result: SemanticMatchResult
    ) -> list[dict]:
        """构建 LLM 候选列表"""
        candidates = []
        
        if rule_result.rule:
            candidates.append({
                "id": str(rule_result.rule_id),
                "name": rule_result.rule.name,
                "description": f"匹配方式: {rule_result.match_type}, 匹配内容: {rule_result.matched_text}"
            })
        
        for candidate in semantic_result.candidates[:3]:
            if not any(c["id"] == str(candidate.rule.id) for c in candidates):
                candidates.append({
                    "id": str(candidate.rule.id),
                    "name": candidate.rule.name,
                    "description": f"语义相似度: {candidate.score:.2f}"
                })
        
        return candidates

4. 融合公式与默认阈值

4.1 融合公式

final_confidence = (w_rule * rule_score + w_semantic * semantic_score + w_llm * llm_score) / total_weight

其中:

  • total_weight = w_rule + w_semantic + (w_llm if llm_triggered else 0)
  • 结果限制在 [0.0, 1.0] 范围内

4.2 默认阈值配置

DEFAULT_FUSION_CONFIG = FusionConfig(
    w_rule=0.5,                    # 规则权重
    w_semantic=0.3,                # 语义权重
    w_llm=0.2,                     # LLM 权重
    semantic_threshold=0.7,        # 语义匹配高置信阈值
    conflict_threshold=0.2,        # 冲突判定阈值(置信度差值)
    gray_zone_threshold=0.6,       # 灰区上限阈值
    min_trigger_threshold=0.3,     # 灰区下限阈值
    clarify_threshold=0.4,         # 澄清触发阈值
    multi_intent_threshold=0.15,   # 多意图判定阈值
    llm_judge_enabled=True,        # 启用 LLM 仲裁
    semantic_matcher_enabled=True, # 启用语义匹配
)

4.3 决策优先级

优先级 决策原因 条件
1 rule_high_confidence RuleMatcher 命中且 score=1.0
2 llm_judge LlmJudge 触发且返回有效意图
3 semantic_override RuleMatcher 未命中但 SemanticMatcher 高置信
4 rule_semantic_agree 规则与语义匹配同一意图
5 semantic_fallback SemanticMatcher 中等置信
6 rule_fallback 仅规则匹配
7 no_match 三路均低置信

5. 异常处理策略

异常场景 处理策略 可观测性
Embedding 调用失败 跳过 SemanticMatcher仅使用 RuleMatcher semantic_match.skipped=true, skip_reason="embedding_failed"
Embedding 超时 跳过 SemanticMatcher仅使用 RuleMatcher semantic_match.skipped=true, skip_reason="embedding_timeout"
Qdrant 检索失败 跳过 SemanticMatcher仅使用 RuleMatcher semantic_match.skipped=true, skip_reason="qdrant_error"
LLM 调用失败 回退到 SemanticMatcher 结果 llm_judge.reasoning="LLM error: ..."
LLM 超时 回退到 SemanticMatcher 结果 llm_judge.reasoning="LLM timeout"
LLM 响应解析失败 回退到 SemanticMatcher 结果 llm_judge.reasoning="parse_failed"
规则缓存失效 重新从数据库加载 rule_match.cache_miss=true
配置缺失 使用默认配置 fusion.using_default_config=true

6. 数据模型变更

6.1 IntentRule 实体扩展

# 位置: app/models/entities.py

class IntentRule(SQLModel, table=True):
    # ... 现有字段 ...
    
    intent_vector: list[float] | None = Field(
        default=None,
        sa_column=Column(JSONB, nullable=True, comment="意图向量(预计算)")
    )
    
    semantic_examples: list[str] | None = Field(
        default=None,
        sa_column=Column(JSONB, nullable=True, comment="语义示例句列表")
    )

6.2 ChatMessage 实体扩展

# 位置: app/models/entities.py

class ChatMessage(SQLModel, table=True):
    # ... 现有字段 ...
    
    route_trace: dict[str, Any] | None = Field(
        default=None,
        sa_column=Column(JSONB, nullable=True, comment="意图路由追踪日志")
    )

7. API 设计

7.1 融合配置 API

GET /admin/intent-rules/fusion-config
PUT /admin/intent-rules/fusion-config

7.2 意图向量生成 API

POST /admin/intent-rules/{id}/generate-vector

7.3 监控 API 扩展

GET /admin/monitoring/conversations/{id}

响应新增 route_trace 字段。


8. 性能优化

8.1 并行执行

RuleMatcher 和 SemanticMatcher 使用 asyncio.gather 并行执行。

8.2 超时控制

  • SemanticMatcher 超时100ms
  • LlmJudge 超时2000ms

8.3 缓存策略

  • 规则缓存:复用现有 RuleCacheTTL=60s
  • 向量缓存:可选,后续迭代考虑

9. 风险与待澄清

9.1 风险

风险 等级 缓解措施
Embedding 调用增加延迟 设置超时,超时跳过
LLM Judge 频繁触发增加成本 配置合理触发阈值
语义向量配置复杂度高 提供自动生成 API

9.2 待澄清

  • 意图向量索引是否需要独立的 Qdrant Collection
  • LlmJudge 的 Token 消耗是否需要单独计费统计?
  • 融合配置是否需要支持租户级差异化配置?